perm filename V2H.TEX[TEX,DEK]1 blob
sn#372439 filedate 1978-08-08 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00020 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 %folio 344 galley 1 (C) Addison-Wesley 1978
C00018 00003 %folio 347 galley 2 (C) Addison-Wesley 1978
C00032 00004 %folio 350 galley 3 WARNING: Much of this tape unreadable! (C) Addison-Wesley 1978
C00049 00005 %folio 354 galley 4 (C) Addison-Wesley 1978
C00062 00006 %folio 357 galley 5 (C) Addison-Wesley 1978
C00078 00007 %folio 360 galley 6 (C) Addison-Wesley 1978
C00096 00008 %folio 363 galley 7 (C) Addison-Wesley 1978
C00110 00009 %folio 366 galley 8 (C) Addison-Wesley 1978
C00126 00010 %folio 370 galley 9 (C) Addison-Wesley 1978
C00141 00011 %folio 372 galley 10 (C) Addison-Wesley 1978
C00155 00012 %folio 376 galley 11 WARNING: Some bad spots on this tape. (C) Addison-Wesley 1978
C00166 00013 %folio 379 galley 12 (C) Addison-Wesley 1978
C00178 00014 %folio 382 galley 13 (C) Addison-Wesley 1978
C00193 00015 %folio 385 galley 14 (C) Addison-Wesley 1978 *
C00204 00016 %folio 388 galley 15 (C) Addison-Wesley 1978 *
C00216 00017 %folio 392 galley 16 (C) Addison-Wesley 1978
C00234 00018 %folio 394 galley 17 (C) Addison-Wesley 1978
C00245 00019 %folio 395 galley 18 (C) Addison-Wesley 1978
C00251 00020
C00252 ENDMK
C⊗;
%folio 344 galley 1 (C) Addison-Wesley 1978
|newtype W58320---Computer Programming\quad (Knuth/Addison-Wesley)\quad
f.344\quad ch.4.\quad g.1b.
|qleft \20skip |newtype \quad \xskip It is a straightforward
matter to apply the classical algorithms for integers to problems
involving numbers with embedded radix points, or rational numbers,
or extended-precision floating-point numbers, in the same way
as the arithmetic operations defined for integers in \MIX\ are
applied to these more general problems.
In this section we shall study algorithms that do operations
(a), (b), and (c) above for integers expressed in radix $b$
notation, where $b$ is any given integer ≥ 2. Thus the algorithms
are quite general definitions of arithmetic processes, and as
such they are unrelated to any particular computer. But the
discussion in this section will also be somewhat machine-oriented,
since we are chiefly concerned with efficient methods for doing
high-precision calculations by computer. Although our examples
are based on the mythical \MIX\ computer, essentially the same
considerations apply to nearly every other machine. For convenience,
let us assume first that we have a computer (like \MIX) that
uses the signed-magnitude representation for numbers; suitable
modifications for complement notations are discussed near the
end of this section.
The most important fact to understand about extended-precision
numbers is that they may be regarded as numbers written in radix
$w$ notation, where $w$ is the computer's word size. For example,
an integer that fills 10 words on a computer whose word size
is $w = 10↑{10}$ has 100 decimal digits; but we will consider
it to be a 10-place number to the base 10$↑{10}. This viewpoint
is justified for the same reason that we may convert, say, from
binary to octal notation, simply by grouping the bits together.
(See Eq.\ 4.1--5.)$
In these terms, we are given the following primitive operations
to work with:
$$|indent a$↓0) addition or subtraction of one-place integers,
giving a one-place answer and a carry;
|qleft b$↓0) multiplication of a one-place integer by another
one-place integer, giving a two-place answer;
|qleft c$↓0) division of a two-place integer by a one-place
integer, provided that the quotient is a one-place integer,
and yielding also a one-place remainder.
|qleft |cancelindent \12skip By adjusting the word size, if
necessary, nearly all computers will have these three operations
available, and so we will construct our algorithms (a), (b),
and (c) mentioned above in terms of the primitive operations
(a$↓0), (b↓0), and (c↓0)$.
Since we are visualizing extended-precision integers as base
$b$ numbers, it is sometimes helpful to think of the situation
when $b = 10$, and to imagine that we are doing the arithmetic
by hand. Then operation (a$↓0) is analogous to memorizing the
addition table; (b↓0) is analogous to memorizing the multiplication
table; and (c↓0) is essentially memorizing the multiplication
table in reverse. The more complicated operations (a), (b),
(c) on high-precision numbers can now be done using the simple
addition, subtraction, multiplication, and long division procedures
we are taught in elementary school. In fact, most of the algorithms
we shall discuss in this section are essentially only mechanizations
of familiar pencil-and-paper operations. Of course, we must
state the algorithms much more precisely than they have ever
been stated in the fifth grade, and we should also attempt to
minimize computer memory and running time requirements.
To avoid a tedious discussion and cumbersome notations, let
us assume that all numbers we deal with are {\sl nonnegative.}
The additional work of computing the signs, etc., is quite straightforward,
and the reader will find it easy to fill in any details of this
sort.
First comes addition, which of course is very simple, but it
is worth studying since the same ideas occur in the other algorithms
also:
\algbegin Algorithm A (Addition of nonnegative integers).
Given nonnegative $n$-place integers $u↓1u↓2 \ldotsm u↓n$ and
$v↓1v↓2 \ldotsm v↓n$ with radix $b$, this algorithm forms their
sum, $(w↓0w↓1w↓2 \ldotsm w↓n)↓b$. (Here $w↓0$ is the ``carry,''
and it will always be equal to 0 or 1.)
\algstep A1. [Initialize.] Set $j ← n$, $k
← 0$. (The variable $j$ will run through the various digit positions,
and the variable $k$ keeps track of carries at each step.)
\algstep A2. [Add digits.] Set $w↓j ← (u↓j + v↓j + k)$mod $b$,
and $k ← \lfloor (u↓j + v↓j + k)/b\rfloor $. (In other words,
$k$ is set to 1 or 0, depending on whether a ``carry'' occurred
or not, i.e., whether $u↓j + v↓j ?? k ≥ b$ or not. At most one
carry is possible during the two additions, since we always
have
$$$\qquad u↓j + v↓j + k ≤ (b - 1) + (b - 1) + 1 < 2b.)$
\algstep A3. [Loop on $j$.] Decrease $j$ by one.
Now if $j > 0$, go back to step A2; otherwise set $w↓0 ← k$
and terminate the algorithm.
|qleft \12skip |cancelindent For a formal proof that Algorithm
A is a valid, see exercise 4.
A \MIX\ program for this addition process might take the following
form:
\algbegin Program A (Addition of nonnegative integers).
Let LOC($u↓j) ≡$ U + $j$, LOC($v↓j) ≡$ V + $j$, LOC($w↓j) ≡$
W + $j$, rI1 O $j$, rA ≡ $k$, word size ≡ $b$, N ≡ $n$.
|qleft \12skip |newtype |tab \qquad |tab \qquad |tab \qquad
\qquad |tab \qquad \qquad |tab \qquad \qquad \qquad |tab \qquad
\qquad \qquad \qquad \qquad \qquad \qquad |tab |zfa ⊗${\sl 01}⊗$ENTI⊗N⊗1⊗${\sl
A1}. Initalize. j ← n.\cr
{\sl 02}⊗$JOV⊗OFLO⊗1⊗Ensure overflow is off.\cr
${\sl 03}⊗$1H⊗ENTA⊗0⊗$N + 1 - K⊗k ← 0.\cr
{\sl 04}⊗$JIZ⊗3F⊗$N + 1 - K⊗$To A3 if $j = 0.\cr
{\sl 05}⊗2$H⊗ADD⊗U,1⊗$N⊗A{\sl 2}. Add digits.\cr
{\sl 06}⊗$⊗ADD⊗V,1⊗$N\cr
{\sl 07}⊗$STA⊗W,1⊗$N\cr
{\sl 08}⊗⊗$DECI⊗1⊗$N⊗A3. Loop on j.\cr
{\sl 09}⊗⊗$JNOV⊗1B⊗$N⊗$If no overflow, set $k ← 0.\cr
{\sl 01}⊗⊗$ENTA⊗1⊗$K⊗$Otherwise, set $k ← 1.\cr
{\sl 11}⊗⊗$J1P⊗2B⊗$K⊗$To A2 if $j ≠ 0.\cr
{\sl 12}⊗$3H⊗STA⊗W⊗1⊗Store final carry in $w↓0.\cr
\12skip |newtype$ The running time for this program is 10$N
+ 6$ cycles, independent of the number of carries, $K$. The
quantity $K$ is analyzed in detail at the close of this section.
Many modifications of Algorithm A are possible, and only a few
of these are mentioned in the exercises below. A chapter on
generalizations of this algorithm might be entitled, ``How to
design adding circuits for a digital computer.''
The problem of subtraction is similar to addition, but the differences
are worth noting:
\algbegin Algorithm S (Subtraction of nonnegative integers).
Given nonnegative $n$-place integers $u↓1u↓2 \ldotsm u↓n ≥ v↓1v↓2
\ldotsm v↓n$ with radix $b$, this algorithm forms their nonnegative
difference, $(w↓1w↓2 \ldotsm w↓n)↓b$.
\algstep S1. [Initialize.] Set $j ← n$, $k ← 0$.
\algstep S2. [Subtract digits.] Set $w↓j ← (u↓j - v↓j + k)$mod
$b$, and $k ← \lfloor (u↓j - v↓j + k)/b\rfloor $. (In other
words, $k$ is set to $-1$ or 0, depending on whether a ``borrow''
occurred or not, i.e., whether $u↓j - v↓j + k < 0$ or not. In
the calculation of $w↓j$ note that we must have $-b = 0 - (b
- 1) ?? (-1) ≤ u↓j - v↓j ?? k ≤ (b - 1) - 0 \times 0 < b$; hence
0 ≤ $u↓j - v↓j + k + b < 2b$, and this suggests the method of
computer implementation explained below.)
\algstep S3. [Loop on {\sl j.]} Decrease $j$ by one. Now if
$j > 0$, go back to step S2; otherwise terminate the algorithm.
(When the algorithm terminates, we should have $k = 0$; the
condition $k = -1$ will occur if and only if $v↓1 \ldotsm v↓n
> u↓1 \ldotsm u↓n$, and this is contrary to the given assumptions.
See exercise 12.)
|qleft \12skip |cancelindent \quad \xskip In a \MIX\ program to
implement subtraction, it is most convenient to retain the value
1 ?? $k$ instead of $k$ throughout the algorithm, so that we
can calculate $u↓j - v↓j + (1 + k) + (b - 1)$ in step S2. (Recall
that $b$ is the word size.) This is illustrated in the following
code:
\algbegin Program S (Subtraction of nonnegative integers).
This program is analogous to Program A; we have rI1 ≡ $j$, rA
≡ 1 + $k$. Here, as in other programs of this section, location
WM1 word; cf.\ Program 4.2.3D, lines 38--3${\sl 01}⊗⊗$ENTI⊗N⊗1⊗$S{\sl
1}. Initialize. j ← n.\cr
{\sl 02}⊗⊗$JOV⊗OFLO⊗1⊗Ensure overflow is off.\cr
${\sl 03}⊗$1H⊗JIZ⊗DONE⊗$K + 1⊗$Terminate if $j = 0.\cr
{\sl 04}⊗⊗$ENTA⊗1⊗$K⊗$Set $k ← 0.\cr
{\sl 05}⊗$2H⊗ADD⊗U,1⊗$N⊗S{\sl 2}. Subtract digits.\cr
{\sl 06}⊗⊗$SUB⊗V,1⊗$N⊗$Compute $u↓j - v↓j + k + b.\cr
{\sl 07}⊗⊗$ADD⊗WM1⊗$N\cr
{\sl 08}⊗⊗$STA⊗W,1⊗$N⊗$(May be minus zero.)\cr
{\sl 09}⊗⊗DEC1⊗1⊗$N⊗S{\sl 3}. Loop on j.\cr
{\sl 10}⊗⊗$JOV⊗1B⊗$N⊗$If overflow, set $k ← 0.\cr
{\sl 11}⊗⊗$ENTA⊗0⊗$N - K⊗$Otherwise, set $k ← -1.\cr
{\sl 12}⊗⊗$JIP⊗2B⊗$N - K⊗$Back to S2.\cr
{\sl 13}⊗⊗????????????|newtype W58320---Computer
%folio 347 galley 2 (C) Addison-Wesley 1978
Programming\quad (Knuth/Addision-Wesley)\quad f.347\quad Ch.4\quad
G.2b.
|qleft \20skip |newtype The running time for this program is
12$N + 3$ cycles, which is slightly longer than that for Program
A.
The reader may wonder if it would not be worth while to have
a combined addition-subtraction routine in place of the two
algorithms A and S. Study of the computer programs shows that
it is generally better to use two different routines, so that
the inner loop of the computation can be performed as rapidly
as possible, since the programs are so short.
Our next problem is multiplication, and here we carry the ideas
used in Algorithm A a little further:
\algbegin Algorithm M (Multiplication of nonnegative integers).
Given nonnegative integers $u↓1u↓2 \ldotsm u↓n$ and $v↓1v↓2 \ldotsm
v↓m$ with radix $b$, this algorithm forms their product $(w↓1w↓2
\ldotsm w↓{m+n})↓b. (??l products (u↓1u↓2 \ldotsm u↓n) \times
v↓j$ first, for 1 ≤ $j ≤ m$, and then adding these products
together with appropriate scale factors; but in a computer it
is best to do the addition concurrently with the multiplication,
as described in this algorithm.)
\algstep M1. [Initialize.] Set $w↓{m+1}$,
$w↓{m+2}, \ldotss , w↓{m+n}$ all to zero. Set $j ← m$. (If $w↓{m+1},
\ldotss , w↓{m+n}$ were not cleared to zero in this step, we
would have a more general algorithm that sets
$$$\qquad (w↓1 \ldotsm w↓{m+n}) ← (u↓1 \ldotsm u↓n) \times (v↓1
\ldotsm v↓m) + \biglp w↓{m+1} \ldotsm w↓{m+n}).\bigrp$
|qctr \9skip {\bf M2.} [Zero multiplier?] If $v↓j = 0$, set
$w↓j ← 0$ and go to step M6. (This test saves a good deal of
time if there is a reasonable chance that $v↓j$ is zero, but
otherwise it may be omitted without affecting the validity of
the algorithm.)
\algstep M3. [Initialize {\sl i.]} Set $i ← n, k ← 0$.
\algstep M4. [Multiply and add.] Set $t ← u↓i \times v↓j + w↓{i+j}
+ k$; then set $w↓{i+j} ← t$ mod $b, k ← \lfloor t/b\rfloor
$. (Here the ``carry'' $k$ will always be in the range 0 ≤ $k
< b$; see below.)
\algstep M5. [Loop on {\sl i].} Decrease $i$ by one. Now if
$i > 0$, go back to step M4; otherwise set $w↓j ← k$.
\algstep M6. [Loop on {\sl j.]} Decrease $j$ by one. Now if
$j > 0$, go back to step M2; otherwise the algorithm terminates.
|qleft \12skip |cancelindent \quad \xskip Algorithm M is illusytr???\12skip
|cancelindent \quad \xskip Algorithm M is illustrated in Table
1, assuming that $b = 10$, by showing the states of the computation
at the beginning of steps M5 and M6. A proof of Algorithm M
appears in the answer to exercise 14.
The two inequalities
$$$0 ≤ t ≤ b↑2,\qquad 0 ≤ k < b\eqno (1)$
|qctr \9skip are crucial for an efficient implementation of
this algorithm, since they point out how large a register is
needed for the computations. These inequalities may be proved
by induction as the algorithm proceeds, for if we have $k <
b$ at the start of step M4, we have
$$$u↓i \times v↓j + w↓{i+j} + k ≤ (b - 1) \times (b - 1) + (b
- 1) + (b - 1) = b↑2 - 1 < b↑2$.
|qctr \12skip |newtype |newtype {\:r Table 1
}|qctr \3skip |newtype MULTIPLICATION OF 914 BY 84.
|qctr \6skip |newtype |tab \qquad \qquad |tab \quad |tab \quad
|tab \quad |tab \quad |tab \qquad |tab \quad |tab \quad |tab
\quad |tab \quad |tab \quad |tab |zfa ⊗Step⊗$i⊗j⊗u↓i⊗t⊗w↓1⊗w↓2⊗w↓3⊗w↓4⊗w↓5\cr$
M5⊗3⊗2⊗4⊗4⊗16⊗$x⊗x⊗0⊗0⊗6\cr$
M5⊗2⊗2⊗1⊗4⊗05⊗$x⊗x⊗0⊗5⊗6\cr$
M5⊗1⊗2⊗9⊗4⊗36⊗$x⊗x⊗6⊗5⊗6\cr$
M6⊗0⊗2⊗$x⊗4⊗36⊗x⊗3⊗6⊗5⊗6\cr$
M5⊗3⊗1⊗4⊗8⊗37⊗$x⊗3⊗6⊗7⊗6\cr$
M5⊗2⊗1⊗1⊗8⊗17⊗$x⊗3⊗7⊗7⊗6\cr$
M5⊗1⊗1⊗9⊗8⊗76⊗$x⊗6⊗7⊗7⊗6\cr$
M6⊗0⊗1⊗$x⊗8⊗76⊗7⊗6⊗7⊗7⊗6\cr
\12skip$ |newtype \quad \xskip The following \MIX\ program shows
the considerations that are necessary when Algorithm M is implemented
on a computer. The coding for step M4 would be a little simpler
if our computer had a ``multiply-nad-add'' instruction, or if
it had a double-length accumulator for addition.
\algbegin Program M. (Multiplication of nonnegative
integers). This program is annalogous to Program A.
rI1 ≡ $i$, rI2 ≡ $i + j$, CONTENTS??CARRY?? ≡ $k$.
|qleft \12skip |newtype |tab \qquad |tab \qquad |tab \qquad
\qquad |tab \qquad \qquad \quad |tab \qquad \qquad \qquad |tab
\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad
\quad |tab |zfa ⊗${\sl 01}⊗$⊗ENT1⊗N⊗1⊗$M{\sl 1}. Initialize.\cr
{\sl 02}⊗⊗$JOV⊗OFLO⊗1⊗Ensure overflow is off.\cr
${\sl 03}⊗⊗$STZ⊗W+M,1⊗$N⊗w↓{m+i} ← 0.\cr
{\sl 04}⊗$DEC1⊗1⊗$N\cr
{\sl 05}⊗⊗$J1P⊗??-2⊗$N⊗$Repeat for $n ≥ i > 0.\cr
{\sl 06}⊗$⊗ENT2⊗M⊗1⊗$j ← m.\cr
{\sl 07}⊗$1H⊗LDX⊗V,2⊗$M⊗M{\sl 2}. Zero multiplier?\cr
{\sl 08}⊗⊗$JXZ⊗8F⊗$M⊗$If $v↓j = 0$, set $w↓j ← 0$ and go to
M6.\cr
{\sl 09}⊗⊗ENT1⊗N⊗$M - Z⊗M{\sl 3}. Initialize i.\cr
{\sl 10}⊗⊗$ENT3⊗N,2⊗$M - Z⊗i ← n, (i + j) ← n + j.\cr
{\sl 11}⊗⊗$ENTX⊗0⊗$M - Z⊗k ← 0.\cr
{\sl 12}⊗2H⊗$STX⊗CARRY⊗$(M - Z)N⊗M{\sl 4}. Multiply and add.\cr
{\sl 13}⊗⊗$LDA⊗U,1⊗$(M - Z)N⊗u↓i\cr
{\sl 14}⊗$MUL⊗V,2⊗$(M - Z)N⊗\times v↓j\cr
{\sl 15}⊗⊗$SLC⊗5⊗$(M - Z)N⊗$Interchange rA ↔ rX.\cr
${\sl 16}⊗⊗$ADD⊗W,3⊗$(M - Z)N⊗$Add $w↓{i+j}$ to lower half.\cr
{\sl 17}⊗⊗JNOV⊗*??2⊗($M - Z)N⊗$Did overflow occur?\cr
${\sl 18}⊗⊗$INCX⊗1⊗$K⊗$If so, carry one into upper half.\cr
${\sl 19}⊗⊗$ADD⊗CARRY⊗$(M - Z)N⊗$Add $k$ to lower half.\cr
${\sl 20}⊗⊗$JNOV⊗*+2⊗$(M - Z)N⊗$Did overflow occur?\cr
${\sl 21}⊗⊗$INCX⊗1⊗$K↑\prime ⊗$If so, carry one into upper half.\cr
${\sl 22}⊗⊗$STA⊗W,3⊗$(M - Z)N⊗w↓{i+j} ← t$ mod $b.\cr
{\sl 23}⊗⊗$DEC1⊗1⊗$(M - Z)N⊗M{\sl 5}. Loop on i.\cr
{\sl 24}⊗⊗$DEC3⊗1⊗$(M - Z)N⊗$Decrease $i$ and $(i + j)$ by one.\cr
${\sl 25}⊗⊗$J1P⊗2B⊗$(M - Z)N⊗$Back to M4 if $i > 0$; rX = \lfloor
$t/b←.\cr
{\sl 26}⊗$8H⊗STX⊗W,2⊗$M⊗$Set $w↓j ← k.\cr$
${\sl 27}⊗⊗$DEC2⊗1⊗$M⊗M{\sl 6}. Loop on j.\cr
{\sl 28}⊗⊗$J2P⊗1B⊗$M⊗???$Repeat until $j = 0.\cr
\12skip |newtype$ The execution time of Program M depends on
the number of places, $M$, in the multiplier; the number of
places, $N$, in the multiplicand; the number of zeros, $Z$,
in the multiplier; and the number of carries, $K$ and $K↑\prime$,
which occur during the addition to the lower half of the product
in the computation of $t$. If we approximate both $K$ and $K↑\prime$
by the reasonable (although somewhat pessimistic) values ${1\over
2}$($M - Z)N$, we find that the total running time comes to
28$MN + 10M + 4N + 3 - Z(28N + 3)$ cycles. If step M2 were deleted,
the running time would be 28$MN + 7M + 4N + 3$ cycles, so this
step is not advantageous unless the density of zero positions
within the multiplier is $Z/M > 3/(28N + 3)$. If the multiplier
is chosen completely at random, this ratio $Z/M$ is expected
to be only about 1/$b$, which is extremely small; so step M2
is generally {\sl not} worth while.
Algorithm M is not the fastest way to multiply when $m$ and
$n$ are large, although it has the advantage of simplicity.
Speedier methods are discussed in Section 4.3.3; even when $m
= n = 4$, it is possible to multiply numbers in a little less
time than is required by Algorithm M.
|qleft \12skip \quad \xskip The final algorithm of concern to
us in this section is long division, in which we want to divide
($n + m)$-place integers by $n$-place integers. Here the ordinary
pencil-and-paper method involves a certain amount of guesswork
and ingenuity on the part of the person doing the division;
we must either eliminate this guesswork from the algorithm or
develop some theory to explain it more carefully.
A moment's reflection about the ordinary process of long division
shows that the general problem breaks down into simpler steps,
each of which is the division of an $(n + 1)$-place number $u$
by the $n$-place divisor $v$, where 0 ≤ $u/v < b$; the remainder
$r$ after each step is less than $v$, so we may use $rb + $(next
place of dividend) as the new $u$ in the succeeding step. For
example, if we are asked to divide 3142 by 47, we first divide
314 by 47, gbe?????????are asked to divide 3142 by 47, we first
divide 314 by 47, getting 6 and a remainder of 32; then we divide
322 by 47, getting 6 and a remainder of 40; thus we have a quotient
of 66 and a remainder of 40. It is clear that this same idea
works in general, and so our search for an appropriate division
algorithm reduces to the following problem (Fig.\ 6);
|qleft \12skip $\quad \xskip Let u = u↓0u↓1 \ldotsm u↓n and v
= v↓1v↓2 \ldotsm v↓n be nonnegative integers in radix b notation,
such that u/v < b. Find an algorithm to determine q = \lfloor
u/v\rfloor $.
|qleft \6skip |newtype {\bf Fig.\ 6. \xskip} Wanted: a way to
determine $q$ rapidly.??U0}|newtype W58320---Compu
%folio 350 galley 3 WARNING: Much of this tape unreadable! (C) Addison-Wesley 1978
ter Programming\quad (Knuth/Addision-Wesley)\quad f.350\quad
Ch.4\quad g.3b.
|qleft \20skip |newtype We may observe that the condition $u/v
< b$ is equivalent to the condition that $u/b > v$; i.e., \lfloor
$u/b\rfloor < v$; and this is the condition that $u↓0u↓1 \ldotsm
u↓{n-1} < v↓1v↓2 \ldotsm v↓n$. Furthermore, if we write $r =
u - qv$, then $q$ is the unique integer such that 0 ≤ $r ≤ v$.
|qleft \quad \xskip The most obvious approach to this problem
is to make a guess about $q$, based on the most significant
digits of $u$ and $v$. It isn't obvious that such a method will
be reliable enough, but it is worth investigating; let us therefore
set
$$$|spose 7q =$ min\left(\left[${u↓0b + u↓1\over v↓1}}, b -
1}.\eqno (2)$
|qctr \9skip This \9skip This formula says $|spose 7q$ is obtained
by dividing the two leading digits of $u$ by the leading digit
of $v$; and if the result is $b$ or more we can replace it by
($b - 1)$.
It is a remarkable fact, which we will now investigate, that
this value $|spose 7q$ is always a very good approximation to
the desired answer $q$, so long as $v↓1$ is reasonably large.
In order to analyze how close $|spose 7q$ comes to $q$, we will
first prove that $|spose 7q$ is never too small.
\thbegin Theorem A. {\sl In the notation
above, |spose 7q ≥ q.}
|qleft \12skip {\sl Proof. \xskip} Since $q ≤ b - 1$, the theorem
is certainly true if $|spose 7q = b - 1$. Suppose therefore
that $|spose 7q < b - 1$; it follows that $|spose 7q = \lfloor
(u↓0b + u↓1)/v↓1\rfloor $, hence $|spose 7qv↓1 ≥ u↓0b + u↓1
- v↓1 + 1$. Therefore
$$$$
|qctr \hfill u - |spose 7qv ≤ u - |spose 7qv$↓1b↑{n-1} ⊗≤ u↓0b↑n
+ \cdots + u↓n\cr
\4skip ⊗ - (u↓0b↑n + u↓1b↑{n-1} - v↓1b↑{n-1} + b↑{n-1})\cr
\4skip ⊗ = u↓2b↑{n-2} + \cdots + u↓n - b↑{n-1} + v|supinf 1b↑{n-1}
< v↓1b↑{n-1} ≤ v.\cr
\9skip$ Since $u - |spose 7qv < v$, we must have $|spose 7q
≥ q$.
|qleft \12skip \quad \xskip We will now prove that $|spose 7q$
cannot be much larger than $q$ in practical situations. Assume
that $|spose 7q ≥ q ?? 3$. We have
$$$|spose 7q ≤ {u↓0b + u↓1\over v↓1} = {u↓0b↑n + u↓1b↑{n-1}\over
v↓1b↑{n-1}} ≤ {u\over v↓1b↑{n-1}} < {u\over v - b↑{n-1}}$.
|qctr \9skip (The case $v = b↑{n-1}$ is impossible, for if $v
= (100 \ldotsm 0)↓b$ then $q = |spose 7q.)$ Furthermore, since
$q > (u/v) - 1$,
$$$3 ≤ |spose 7q - q < {u\over v - b↑{n-1}} - {u\over v} + 1
= {u\over v} \left({b↑{n-1}\over v - b↑{n-1}}} + 1$.
|qctr \9skip Therefore
$$${u\over v} > 2 \left({v - b↑{n-1}\over b↑{n-1}}} ≥ 2(v↓1
- 1)$.
|qctr \9skip Finally, since $b - 4 ≥ |spose 7q - 3 ≥ q = \lfloor
u/v\rfloor ≥ 2(v↓1 - 1)$, we have $v↓1 < \lfloor b/2\rfloor
$. This proves Theorem B:
\thbegin Theorem B. {\sl If v↓1 ≥ \lfloor b/2\rfloor ,
then |spose 7q - 2 ≤ q ≤ |spose 7q.}
|qleft \12skip \quad \xskip The most important part of this
theorem is that {\sl the conclusion is independent of b;} no
matter how large $b$ is, the trial quotient $|spose 7q$ will
never be more than 2 in error!
The condition that $v↓1 ≥ \lfloor b/2\rfloor$ is very much like
a normalization condition (in fact, it is exactly the condition
of normalization in a binary computer). One simple way to ensure
that $v↓1$ is sufficiently large is to multiply {\sl both u}
and $v$ by \lfloor $b/(v↓1 + 1)\rfloor $; that does not change
the value of $u/v$, nor does it increase the number of places
in $v$, and exercise 23 proves that it will always make the
new value of $v↓1$ large enough. ({\sl Note{\sl : }}For another
way to normalize the divisor, see exercise 28.)
Now that we have armed ourselves with all of these facts, we
are in a position to write the desired long division algorithm.
This algorithm uses a slightly improved choice of $|spose 7q$
in step D3, which guarantees that $q = |spose 7q$ or $|spose
7q - 1$; in fact, the improved choice of $|spose 7q$ made here
is almost always accurate.
\algbegin Algorithm D (Division of nonnegative
integers). Given nonnegative integers $u = u↓1u↓2 \ldotsm
u↓{m+n}$ and $v = v↓1v↓2 \ldotsm v↓n$ with radix $b$, where $v↓1
≠ 0$ and $n < 1$, we form the quotient \lfloor $u/v\rfloor =
(q↓0q↓1 \ldotsm q↓m)↓b$ and the remainder $u$ mod $v = (r↓1r↓2
\ldotsm r↓n)↓b$. (This notation is slightly different from that
used in the above proofs. When $n = 1$, the simpler algorithm
of exercise 16 should be used.)
\algstep D1. [Normalize.] Set $d ← \lfloor
b/(v↓1 + 1)\rfloor $. Set $u↓0u↓1u↓2 \ldotsm u↓{m+n}$ equal to
$u↓1u↓2 \ldotsm u↓{m+n}$ times $d$. Set $v↓1v↓2 \ldotsm v↓n$ equal
to $v↓1v↓2 \ldotsm v↓n$ times $d$. (Note the introduction of
the new digit position $u↓0$ at the left of $u↓1$; if $d = 1$,
all we need to do in this step is to set $u↓0 ← 0$. On a binary
computer it may be preferable to choose $d$ to be a power of
2 instead of using the value suggested here; any value of $d$
that results in $v↓1 ≥ \lfloor b/2\rfloor$ will suffice here.)
\algstep D2. [Initialize {\sl j.]} Set $j ← 0$. (The loop on
$j$, steps D2 through D7, will be essentially a division of
$u↓ju↓{j+1} \ldotsm u↓{j+n}$ by $v↓1v↓2 \ldotsm v↓n$ to get a
single quotient digit $q↓j$; cf.\ Fig.\ 6.)
\algstep D3. [Calculate $|spose 7q.]$ If $u↓j = v↓1$, set $|spose
7q ← b - 1$; otherwise set $|spose 7q ← \lfloor (u↓jb + u↓{j+1})/v↓1\rfloor
$. Now test if $v↓2|spose 7q > (u↓jb + u↓{j+1} - |spose 7qv↓1)b
+ u↓{j+2}$; if so, decrease $|spose 7q$ by 1 and repeat this
test. (The latter test determines at high speed most of the
cases in which the trial value $|spose 7q$ is one too large,
and it eliminates {\sl all} cases where $|spose 7q$ is two too
large; see exercises 19, 20, 21.)
\algstep D4. [Multiply and subtract.] Replace $u↓ju↓{j+1} \ldotsm
u↓{j+n}$ by $u↓ju↓{j+1} \ldotsm u↓{j+n}$ minus$ (|spose 7q$ times
$v↓1v↓2 \ldotsm v↓n)$. This step (analogous to steps M3 to M5
of Algorithm M) consists of a simple multiplication by a one-place
number, combined with a subtraction. The digits $u↓ju↓{j+1}
\ldotsm u↓{j?}βn$ should be kept positive; if the result of this
step is actually negative, $u↓ju↓{j+1} \ldotsm u↓{j+n}$ whould
be left as the true value plus $b↑{n+1}$, i.e., as the $b$'s
complement of the true value, and a ``borrow'' to the left should
be remembered.
|qleft \6skip |newtype {\bf Fig.\ 7. \xskip} Long division.
\algstep D5. [Test remainder.] Set $q↓j ←
|spose 7q$. If the result of step D4 was negative, go to step
D6; otherwise go on to step D7.
\algstep D6. [Add back.] (The probability that this step is
necessary is very small, on the order of only $3/b$, see exercise
21; test data that activates this step should therefore be
specifically continued when debugging.) Decrease $q↓j$ by 1,
and add $0v↓1v↓2 \ldotsm v↓n$ to $u↓ju↓{j+1}u↓{j+2} \ldotsm u↓{j+n}.
($A carry will occur to the left of $u↓j$, and it should be
ignored since it cancels with the ``borrow'' that occurred
in D4.)
\algstep D7. [Loop on {\sl j.]} Increase $j$ by one. Now if
$j ≤ m$, go back to D3.
\algstep D8. [Unnormalize.] Now $q↓0q↓1 \ldotsm q↓m$ is the desired
quotient, and the desired remainder may be obtained by dividing
$u↓{m+1} \ldotsm u↓{m+n}$ by $d$.
|qleft \12skip |cancelindent \quad \xskip The representation
of Algorithm D as a \MIX\ program has several points of interest:
\algbegin Program D (Division of nonnegative integers).
The conventions of this program are analogous to Program A;
rI1 ≡ $i$, rI2 ≡ $j - m$, rI3 ≡ $i + j$. Steps D1 and D8 have
been left as exercises.
|qleft \12skip |newtype |tab \qquad \quad |tab \qquad |tab \qquad
\quad |tab \qquad \qquad \qquad |tab \qquad \qquad \qquad \qquad
\quad |tab \qquad \qquad \qquad \qquad \qquad \qquad \qquad
\qquad \qquad \qquad |tab |zfa ⊗${\sl 001}⊗$D1⊗JOV⊗OFLO⊗1⊗$D{\sl
1}. Normalize.\cr
\cdot \cdot \cdot ⊗⊗⊗⊗⊗$(See exercise 25)\cr
${\sl 039}⊗$D2⊗ENN2⊗M⊗1⊗$D{\sl 2}. Initialize j.\cr
{\sl 040}⊗⊗$STZ⊗V⊗1⊗Set $v↓0 ← 0$, for convenience in D4.\cr
{\sl 041}⊗D3⊗LDA⊗U+M,2??1:5??⊗$M + 1⊗D{\sl 3}. Calculate |spose
7q.\cr
{\sl 042}⊗⊗$LDX⊗U+M+1,2⊗$M + 1⊗$rAX ← $u↓jb + u↓{j+1}.\cr
{\sl 043}⊗⊗$DIV⊗V+1⊗$M + 1⊗$rA ← \lfloor rAX/$v↓1\rfloor .\cr
{\sl 044}⊗⊗$JOV⊗1F⊗$M + 1⊗$Jump if quotient = $b.\cr
???{\sl 045}⊗⊗$STA⊗QHAT⊗$M + 1⊗|spose 7q ←$ rA.\cr
${\sl 046}⊗⊗$STX⊗RHAT⊗$M + 1⊗|spose 7r ← u↓jb + u↓{j+1} - |spose
7qv↓1\cr
{\sl 047}⊗⊗$JMP⊗2F⊗$M + 1⊗\qquad ?? (u↓jb + u↓{j+1})$mod $v↓1.\cr
{\sl 048}⊗$1H⊗LDX⊗WM1⊗⊗rX ← $b - 1.\cr
{\sl 049}⊗⊗$LDA⊗U+M+1,2⊗⊗rA ← $u↓{j+1}$. (Here $u↓j = v↓1.)\cr
{\sl 050}⊗$⊗JMP⊗4F⊗\cr
${\sl 051}⊗$3H⊗LDX⊗QHAT⊗$E\cr
??2??????¬qHAT$
|qleft $(N + 1)(M + 1)$
|qctr rAX ← $-|spose 7qv↓i$.
|qleft \cr
${\sl 071}⊗$⊗SLC⊗5⊗$(N + 1)(M + 1)⊗$Interchange rA ↔ rX.\cr
${\sl 072}⊗⊗$ADD⊗CARRY⊗$(N + 1)(M + 1)⊗$Add the contribution
from the\cr
${\sl 073}⊗$⊗JNOV⊗*+2⊗$(N ?? 1)(M + 1)⊗$\qquad digit to the
right, plus 1.\cr
${\sl 074}⊗$⊗DECX⊗1⊗$K⊗$If sum is ≤ $-b$, carry $-1$.\cr
${\sl 077}⊗$⊗ADD⊗U,3⊗$(N + 1)(M + 1)⊗$Add $u↓{i+j}.\cr
{\sl 076}⊗$⊗ADD⊗WM1⊗$(N + 1)(M + 1)⊗$Add $b - 1$ to force +
sign.\cr
${\sl 077}⊗$⊗JNOV⊗*+2⊗$(N + 1)(M \times 1)⊗$If no overflow,
carry $-1$.\cr
${\sl 078}⊗$⊗INCX⊗1⊗$K↑\prime ⊗$rX ≡ carry $+1$.\cr
${\sl 079}⊗$⊗STA⊗U,3⊗($N + 1)(M + 1)⊗$$u↓{i+j} ←$ rA (may be
minus zero).\cr
${\sl 080}⊗$⊗DEC1⊗1⊗$(N + 1)(M + 1)\cr
{\sl 081}⊗$⊗DEC3⊗1⊗$(N + 1)(M + 1)\cr
{\sl 072}⊗$⊗J1NN⊗2B⊗$(N + 1)(M + 1)⊗$Repeat for $n≥ i ≥ 0.\cr
{\sl 083}⊗$D5⊗LDA⊗QHAT⊗$M + 1⊗D{\sl 5}. Test remainder.\cr
{\sl 084}⊗$⊗STA⊗Q+M,2⊗$M + 1⊗$Set $q↓j ← |spose 7q.\cr
{\sl 085}⊗$⊗JXP⊗D7⊗$M + 1⊗$(Here rX = 0 or 1, since $v↓0 = 0.)\cr
{\sl 086}⊗$D6⊗DECA⊗1⊗⊗$D{\sl 6}. Add back.\cr
{\sl 087}⊗$⊗STA⊗Q+M,2⊗⊗Set $q↓j ← |spose 7q - 1.\cr
{\sl 088}⊗$⊗ENT1⊗N⊗⊗$i ← n.\cr
{\sl 089}⊗$⊗ENT3⊗M+N,2⊗⊗$(i + j) ← n + j.\cr
{\sl 090}⊗$1H⊗ENTA⊗0⊗⊗(This is essentially Program A.)\cr
${\sl 091}⊗$2H⊗ADD⊗U,3\cr
${\sl 092}⊗$⊗ADD⊗V,1\cr
${\sl 093}⊗$⊗STA⊗U,3\cr
${\sl 094}⊗$⊗DEC1⊗1\cr
${\sl 095}⊗$⊗DEC3⊗1\cr
${\sl 096}⊗$⊗JNOV⊗1B\cr
${\sl 097}⊗$⊗ENTA⊗1\cr
${\sl 098}⊗$⊗J1P⊗2B⊗⊗(Not necessary to add to $u↓j.)\cr
{\sl 099}⊗$D7⊗INC2⊗1⊗$M + 1⊗D{\sl 7}. Loop on j.\cr
{\sl 100}⊗⊗$J2NP⊗D3⊗$M + 1⊗$Repeat for 0 ≤ $j ≤ m.\cr
{\sl 101}⊗$D8⊗\cdot \cdot \cdot ⊗⊗⊗(See exercise 26)\cr
β|newtype W58320---Computer
%folio 354 galley 4 (C) Addison-Wesley 1978
Programming\quad (Knuth/Addision-Wesley)\quad f.354\quad Ch.4\quad
g.4b.
|qleft \20skip |newtype \quad \xskip Note how easily the rather
complex appearing calculations and decisions of step D3 can
be handled inside the machine. Note also that the program for
step D4 is analogous to Program M, except that the ideas of
Program S have also been incorporated. In step D6, use has been
made of the fact that $v↓0 = 0$, and that $u↓j$ is not needed
in the subsequent calculations; a strict interpretation of Algorithm
D would require line 098 to be ``JINN 2B.''
The running time for Program D can be estimated by considering
the quantities $M, N, E, K$, and $K↑\prime$ shown in the program.
(These quantities ignore several situations that can only occur
with very low probability; for example, we may assume that lines
048--050, 063--064, and step D6 are never executed.) Here $M
+ 1$ is the number of words in the quotient; $N$ is the number
of words in the divisor; $E$ is the number of times $|spose
7q$ is adjusted downwards in step D3; $K$ and $K↑\prime$ are
the number of times certain ``carry'' adjustments are made during
the multiply-subtract loop. If we assume that $K + K↑\prime$
is approximately $(N + 1)(M + 1)$, and that $E$ is approximately
${1\over 2}$$M$, we get a total running time of approximately
$$$30MN + 30N + 89M + 111$
|qctr \9skip cycles, plus 67$N + 235M + 4$ more if $d > 1$. (The
program segments of exercises 25 and 26 are included in these
totals.) When $M$ and $N$ are large, this is only about seven
percent longer than the time Program M takes to multiply the
quotient by the divisor.
Further commentary on Algorithm D appears in the exercises at
the close of this section.
|qleft \12skip \quad \xskip It is possible to debug programs
for multiple-precision arithmetic by using the multiplication
and addition routines to check the result of the division routine,
etc. The following type of test data is occasionally useful:
$$$(t↑m - 1)(t↑n - 1) = t↑{m+n} - t↑n - t↑m + 1$.
|qctr \9skip If $m < n$, this number has the radix $t$ expansion
$$${(t - 1)\quad \cdot \cdot \cdot \quad (t - 1)\atop m - 1$
places}\quad {(t + 2)\atop \quad }\quad {(t - 1)\quad \cdot
\cdot \cdot \quad (t - 1)\atop n - m$ places}\quad {0\quad \cdot
\cdot \cdot \quad 0\quad 1;\atop m - 1$ places}$
|qctr \9skip for example, (10$↑3 - 1)(10↑5 - 1) = 99899001.
In the case of Program D, it is also necessary to find some
test cases that cause the rarely executed parts of the program
to be used; some portions of that program would probably never
get tested even if a million random test cases were tried$.
Now that we have seen how to operate with signed-magnitude numbers,
let us consider what approach should be taken to the same problems
when a computer with complement notation is being used. For
two's complement and one's complement notations, it is best
to let the radix $b$ be $one-half$ the word size; thus for a
32-bit computer word we would use $b = 2↑{31}$ in the above
algorithms. The sign bit of all but the most significant word
of a multiple-precision number will be zero, so that no anomalous
sign correction takes place during the computer's multiplication
and division operations. In fact, the basic meaning of complement
notation requires that we consider all but the most significant
word to be nonnegative: For example, assuming a 10-bit word,
the two's complement number
$$1101111110\qquad 111111010\qquad 011101011
|qctr \9skip (where the sign is given only for the most significant
word) is properly thought of as
$$-2$↑{27} + (101111110)↓2 \cdot 2↑{18} + (11111010)↓2 \cdot
2↑9 + (011101011)↓2$.
|qctr \9skip \quad \xskip Addition of signed numbers is slightly
easier when complement notations are being used, since the routine
for adding $n$-place nonnegative integers can be used for arbitrary
$n$-place integers; the sign appears only in the first word,
so the less significant words may be added together irrespective
of the actual sign. (Special attention must be given to the
leftmost carry when ones' complement notation is being used,
however; it must be added into the least significant word, and
possibly propagated further to the left.) Similarly, we find
that subtraction of signed numbers is slightly simpler with
complement notation. On the other hand, multiplication and division
seem to be done most easily by working with nonnegative quantities
and doing suitable complementation operations beforehand to
make sure both operands are nonnegative; it may be possible
to avoid this complementation by devising some tricks for working
directly with negative numbers in a complement notation, and
it is not hard to see how this could be done in double-precision
multiplication, but care should be taken not to slow down the
inner loops of the subroutines when high precision is required.
Note that the product of two $m$-place numbers in two's complement
notation may require $2m + 1$ places: the square of $-b↑m$ is
$b↑{2m}$.
|qleft \12skip \quad \xskip Let us now turn to an analysis of
the quantity $K$ that arises in Program A, i.e., the number
of carries that occur when $n$-place numbers are being added
together. This quantity $K$ plays no part in the total running
time of Program A, but it does affect the running time of the
counterpartzs of Program A that deal with complement notations,
and its analysis is interesting in itself as a significant application
of generating functions.
Suppose now that $u$ and $v$ are independent random $n$-place
integers uniformly distributed in the range 0 ≤ $u, v < b↑n$.
Let $p↓{nk}$ be the probability that exactly $k$ carries occur
in the addition of $u$ to $v, and$ that one of these carries
occurred in the most significant position (so that $u + v ≥
b↑n)$. Similarly, let $q↓{nk}$ be the probability that exactly
$k$ carries occur, but there is no carry in the most significant
position. Then it is not hard to see that
|qleft $\9skip p↓{0k} = 0,\qquad q↓0k = \delta ↓{0k},\qquad$
for all $k$;
|qctr \4skip p$↓{(n+1)(k+1)} |tab = {b + 1\over 2b} p↓{nk} +
{b - 1\over 2b} q↓{nk},\eqno (3)⊗\4skip \hfill q↓{(n+1)k} ⊗=
{b - 1\over 2b} p↓{nk} + {b + 1\over 2b} q↓{nk};\cr
\9skip$ this happens because $(b - 1)/2b$ is the probability
that $u↓1 + v↓1 ≥ b$ and $(b + 1)/2b$ is the probability that
$u↓1 + v↓1 + 1 ≥ b$, when $u↓1$ and $v↓1$ are independently
and uniformly distributed integers in the range 0 ≤ $u↓1, v↓1
< b$.
|qleft \quad \xskip To obtain further information about these
quantities $p↓{nk}$ and $q↓{nk}$, we may set up the generating
functions
$$$P(z, t) = \sum ↓{k,n}p↓{nk}z↑kt↑n,\qquad Q(z, t) = \sum ↓{k,n}q↓{nk}z↑kt↑n;\eqno
(4)$
|qctr \9skip from (3) we have the basic relations
$$$$
|qctr \hfill P(z, t) ⊗= zt \left(${b + 1\over 2b}$ P(z, t) +
${b - 1\over 2b}$ Q(z, t)},\cr
\4skip \hfill Q(z, t) ⊗= 1 + t \left(${b - 1\over 2b}$ P(z,
t) + ${b + 1\over 2b}$ Q(z, t)}.\cr
\9skip These two equations are readily solved for $P(z, t)$;
and if we let
$$$G(z, t) = P(z, t) + Q(z, t) = \sum ↓{n}G↓n(z)t↑n$,
|qctr \9skip where $G↓n(z)$ is the generating function for the
total number of carries when $n$-place numbers are added, we
find that
$$$G(z, t) = (b - zt)/p(z, t),\quad$ where\quad $p(z, t) = b
- {1\over 2}(1 + b)(1 + z)t + zt↑2.\eqno (5)$
|qctr \9skip Note that $G(1, t) = 1/(1 - t)$, and this checks
with the fact that $G↓n(1)$ must equal 1 (it is the sum of all
the possible probabilities). Taking partial derivatives of (5)
with respect to $z$, we find that
$$$$
|qctr \hfill ${ G\over z}$ ⊗= \sum ↓{n}G↑{↑\prime}↓{n}(z)t$↑n
= {-t\over p(z, t)} + {t(b - zt)(|newtype$
%folio 357 galley 5 (C) Addison-Wesley 1978
πW58320---Computer Programming\quad (Knuth/Addision-Wesley)\quad
f.357\quad Ch.4\quad g.5b.
|qleft \20skip |newtype Now let us put $z = 1$ and expand in
partial fractions:
$$$\sum ↓{n}G↑{↑\prime}↓{n}(1)t↑n |tab = {t\over 2} \left({1\over
(1 - t)↑2} - {1\over (b - 1)(1 - t)} + {1\over (b - 1)(b - t)},⊗\4skip
\hfill \sum ↓{n}G↑{\dprime }↓{n}(1)t↑n ⊗= {t↑2\over 2} \left({1\over
(1 - t)↑3} - {1\over (b - 1)↑2(1 - t)} + {1\over (b - 1)↑2(b
- t)}\cr
\4skip + {1\over (b - 1)(b - t)↑2}}$.
|qright \9skip It follows that the average number of carries,
i.e., the mean value of $K$, is
$$$G↑{↑\prime}↓{n}(1) = {1\over 2} |newtype \left(|newtype n
- {1\over b - 1} \left(1 - \left({1\over b}}↑n}|newtype }|newtype
;\eqno (6)$
|qctr \9skip the variance is
$$$G↑{\dprime }↓{n}(1) + G↑{↑\prime}↓{n}(1) - G↑{↑\prime}↓{n}(1)↑$
|qleft 2\4skip = ${1\over 4}$ |newtype \left(|newtype n + ${2n\over
b - 1}$ - ${2b + 1\over (b - 1)↑2}$ + ${2b + 2\over (b - 1)↑2}$
\left(${1\over b}$}$↑n - {1\over (b - 1)↑2} \left({1\over b}}↑n
- {1\over (b - 1)↑2} \left({1\over b}}↑{2n}|newtype }|newtype
.\quad (7)$
|qright \9skip So the number of carries is just slightly less
than ${1\over 2}n$ under these assumptions.
\subsectionbegin{History and Bibliography}
The early history of the classical algorithms described in this
section is left as an interesting project for the reader, and
only the history of their implementation on computers will be
traced here.
The use of 10$↑n$ as an assumed radix when multiplying large
numbers on a desk calculator was discussed by D. N. Lehmer and
J. P. Ballantine, {\sl AMM \bf 30} (1923), 67--69.
Double-precision arithmetic on computers was first treated by
J. von Neumann and H. H. Goldstine [J. von Neumann, {\sl Collected
Works \bf 5}, 142--151]. Theorems A and B above are due to D.
A. Pope and M. L. Stein [{\sl CACM \bf 3} (1960), 652--654];
their article also contains a bibliography of earlier work on
double-precision routines. Other ways of choosing the trial
quotient $|spose 7q$ have been discussed by A. G. Cox and H.
A. Luther, {\sl CACM \bf 4} (1961), 353 [divide by $v↓1 + 1$
instead of $v↓1]$, and by M. L. Stein, {\sl CACM \bf 7} (1964),
472--474 [divide by $v↓1$ or $v↓1 + 1$ according to the magnitude
of $v↓2]$; Krishnamurthy [{\sl CACM \bf 8} (1965), 179--181]
showed that examination of the single-precision remainder in
the latter method leads to an improvement over Theorem B. Krishnamurthy
and Nadi, {\sl CACM \bf 10} (1967), 809--813, suggested a way
to replace normalization and unnormalization operations of Algorithm
D by a calculation of $|spose 7q$ based on several leading digits
of the operands.
Several other methods for division have been suggested:
(1) ``Fourier division'' [J. Fourier, {\sl Analyse des |spose
1equations d|spose 1etermin|spose 1ees (}Paris, 1831), Sec.
2.21]. This method, which was often used on desk calculators,
essentially obtains each new quotient digit by increasing the
precision of the divisor and the dividend at each step. Some
rather extensive tests by the author have indicated that this
method is certainly inferior to the ``divide and correct'' technique
above, but there may be some applications in which Fourier division
is practical. See D. H. Lehmer, {\sl AMM \bf 33} (1926), 198--206;
J. V. Uspensky, {\sl Theory of Equations} (New York: McGraw-Hill,
1948), 159--164.
(2) ``Newton's method'' for evaluating the reciprocal of a number
was extensively used in early computers when there was no single-precision
division instruction. The idea is to find some initial approximation
$x↓0$ to the number 1/$v$, then to let $x↓{n+1} = 2x↓n - vx↑{2}↓{n}$.
This method converges rapidly to 1/$v$, since $x↓n = (1 - ε)/v$
implies that $x↓{n+1} = (1 - ε↑2)/v$. Convergence to third order,
i.e., with $ε$ replaced by $O(ε↑3)$ at each step, can be obtained
using the formula
$$$$
|qctr \hfill x$↓{n+1} ⊗= x↓n + x↓n(1 - vx↓n) + x↓n(1 - vx↓n)↑2\cr
\4skip ⊗ = x↓n\biglp 1 + (1 - vx↓n)(1 + (1 - vx↓n))\bigrp ,\cr
\9skip$ |newtype etc.; see P. Rabinowitz, {\sl CACM \bf 4} (1961),
98. For calculations on extremely large numbers, Newton's second-order
method (followed by multiplication by $u)$ can actually be considerably
faster than Algorithm D, if we increase the precision of $x↓n$
at each step and if we also use the fast multiplication routines
of Section 4.3.3. (See Algorithm 4.3.3D for details.) Some related
iterative schemes have been discussed by E. V. Krishnamurthy,
{\sl IEEE Trans.\ \bf C--19} (1970), 227--231.
(3) Division methods have also been based on the evaluation
of
$$${u\over v + ε} = {u\over v} |newtype \left(|newtype 1 - \left({ε\over
v}} + \left({ε\over v}}↑2 - \left({ε\over v}}↑3 + \cdot \cdot
\cdot |newtype }|newtype $.
|qctr \9skip See H. H. Laughlin, {\sl AMM \bf 37} (1930), 287--293.
We have used this idea in the double-precision case (Eq.\ 4.2.3--3).
|qleft \12skip |newtype \quad \xskip Besides the references
just cited, the following early articles concerning multiple-precision
arithmetic are of interest: High-precision floating-point routines
using ones' complement arithmetic are described by A. H. Stroud
and D. Secrest, {\sl Comp.\ J. \bf 6} (1963), 62--66. Extended-precision
subroutines for use in FORTRAN programs are described by B.
I. Blum, {\sl CACM \bf 8} (1965), 318--320; and for use in ALGOL
by M. Tienari and V. Suokonautio, {\sl BIT \bf 6} (1966), 332--338.
Arithmetic on integers with {\sl unlimited} precision, making
use of linked memory allocation techniques, has been elegantly
described by G. E. Collins, {\sl CACM \bf 9} (1966), 578--589.
For a much larger repertoire of operations, including logarithms
and trigonometric functions, see R. W. Brent, {\sl ACM Trans.\
Math.\ Software} (to appear).
We have restricted our discussion in this section to arithmetic
techniques for use in computer programming. There are many algorithms
for {\sl hardware} implementation of arithmetic operations that
are very interesting, but which appear to be inapplicable to
computer programs for high-precision numbers; for example, see
G. W. Reitwiesner, ``Binary Arithmetic,'' {\sl Advances in Computers
\bf 1} (New York: Academic Press, 1960), 231--308; O. L. MacSorley,
{\sl Proc.\ IRE \bf 49} (1961), 67--91; G. Metz, {\sl IRE Trans.\
\bf EC--11} (1962), 76--764; H. L. Garner, ``Number Systems
and Arithmetic,'' {\sl Advances in Computers \bf 6} (New York:
Academic Press, 1965), 131--194. The minimum achievable execution
time for hardware addition and multiplication operations has
been investigated by S. Winograd, {\sl JACM \bf 12} (1965),
277--285, {\bf 14} (1967), 793--802, and by R. W. Floyd, {\sl
IEEE Symp.\ Foundations Comp.\ ....} by R. Brent, {\sl IEEE Trans.\
\bf C--19} (1970), 758--759, {\sl Sci.\ \bf 16} (1975), 3--5.
\exbegin{EXERCISES}
\exno 1. [42] Study
the early history of the classical algorithms for arithmetic,
by looking up the writings of, say, Sun Ts|spose ??2u, al-Khow|spose
7arizm|spose 7i, Fibonacci, and Robert Recorde, and by translating
their methods as faithfully as possible into more precise algorithmic
notation.
\exno 2. [15] Generalize Algorithm A so that it does ``column
addition,'' i.e., obtains the sum of $m$ nonnegative $n$-place
integers. (Assume that $m ≤ b.)$
\exno 3. [21] Write a \MIX\ program
for the algorithm of exercise 2, and estimate its running time
as a function of $m$ and $n$.
\exno 4. [M21] Give a formal proof
of the validity of Algorithm A, using the method of ``inductive
assertions'' as explained in Section 1.2.1.
\exno 5. [21] Algorithm A adds the two inputs by going from
right to left, but sometimes the data is more readily accessible
from left to right. Design an algorithm that produces the same
answer as Algorithm A, but that generates the digits of the
answer from left to right, and goes back to change previous
values if a carry occurs to make a previous value incorrect.
({\sl Note{\sl : }}Early Hindu and Arabic manuscripts were based
on addition from left to right in this way; the right-to-left
addition algorithm was a refinement due to later Arabic writers,
perhaps because Arabic is written from right to left.)
\exno 6. [22] Design an algorithm that adds from left to right
(as in exercise 5), but that does not store a digit of the
answer until this digit cannot possibly be affected by future
carries; there is to be no changing of any answer digit once
it has been stored.\xskip [{\sl Hint:} Keep track of the number
of consecutive $(b - 1)$'s that have not yet been stored in
the answer.] This sort of algorithm would be appropriate, for
example, in a situation where the input and output numbers are
to be read and written from left to right on magnetic tapes.
\exno 7. [M26] Determine the average number of times the algorithm
of exercise 5 will find that a carry makes it necessary to go
back and change $k$ digits of the partial answer, for $k = 1,2,
\ldotss , n$. (Assume that both inputs are independently and
uniformly distributed between 0 and $b↑n - 1.)$
\exno 8. [M26] Write a \MIX\ program
for the algorithm of exercise 5, and determine its average running
time based on the expected number of carries as computed in
the text.
\exno 9. [21] Generalize Algorithm A to obtain an algorithm
that adds two $n$-place numbers in a {\sl mixed-radix} number
system, with bases $b↓0, b↓1, \ldots$ (from right to left).
Thus the least significant digits lie between 0 and $b↓0 - 1$,
the next digits lie between 0 and $b↓1 - 1$, etc.; cf.\ Eq.\ 4.1--9.
|qleft β??????|newtype W58320---Comput
%folio 360 galley 6 (C) Addison-Wesley 1978
er Programming\quad (Knuth/Addision-Wesley)\quad f.360\quad
Ch.4\quad G.6b.
\exno 10. [18] Would
program S work properly if the instructions on lines 06 and
07 were interchanged? If the instructions on lines 05 and 06
were interchanged?
\exno 11. [10] Design an algorithm that compares two nonnegative
$n$-place integers $u = u↓1u↓2 \ldotsm u↓n$ and $v = v↓1v↓2 \ldotsm
v↓n$ with radix $b$, to determine whether $u < v, u = v$, or
$u > v$.
\exno 12. [16] Algorithm S assumes that we know which of the
two input operands is the larger; if this information is not
known, we could go ahead and perform the subtraction anyway,
and we would find that an extra ``borrow'' is still present
at the end of the algorithm. Design another algorithm that
could be used (if there is a ``borrow'' present at the end of
Algorithm S) to complement $w↓1w↓2 \ldotsm w↓n$ and therefore
to abtain the absolute value of the difference of $u$ and $v$.
\exno 13. [21] Write a \MIX\ program that multiplies $(u↓1u↓2
\ldotsm u↓n)↓b$ by $v$, where $v$ is a single-precision number
(i.e., 0 ≤ $v < b)$, producing the answer $(w↓0w↓1 \ldotsm w↓n)↓b$.
How much running time is required?
\exno 14. [M24] Give a formal proof of the validity of Algorithm
M, using the method of ``inductive assertions'' as explained
in Section 1.2.1.
\exno 15. [M20] If we wish to form the product of two $n$-place
fractions, $(.u↓1u↓2 \ldotsm u↓n)↓b \times (.v↓1v↓2 \ldotsm v↓n)↓b$,
and to obtain only an $n$-place approximation $(.w↓1w↓2 \ldotsm
w↓n)↓b$ to the result, Algorithm M could be used to obtain a
2$n$-place answer that is then rounded to the desired approximation.
But this involves about twice as much work as is necessary for
reasonable accuracy, since the products $u↓iv↓j$ for $i + j
> n + 2$ contribute very little to the answer.
|qleft \qquad Give an estimate of the maximum error that can
occur, if these products $u↓iv↓j$ for $i + j > n + 2$ are not
computed during the multiplication, but are assumed to be zero.
\exno 16. [20] Design an algorithm
that divides a nonnegative $n$-place integer $u↓1u↓2 \ldotsm
u↓n$ by $v$, where $v$ is a single-precision number (i.e., 0
< $v < b)$, producing the quotient $w↓1w↓2 \ldotsm w↓n$ and remainder
$r$.
\exno 17. [M20] In the notation of Fig.\ 6, assume that $v↓1
≥ \lfloor b/2\rfloor $; show that if $u↓0 = v↓1$, we must have
$q = b - 1$ or $b - 2$.
\exno 18. [M20] In the notation of Fig.\ 6, show that if $q↑\prime
= \lfloor (u↓0b + u↓1)/(v↓1 + 1)\rfloor $, then $q↑\prime ≤
q$.
\exno 19. [M21] In the notation of Fig.\ 6, let $|spose 7q$ be
an approximation to $q$, and let $|spose 7r = u↓0b + u↓1 - |spose
7qv↓1$. Assume that $v↓1 > 0$. Show that if $v↓2|spose 7q >
b|spose 7r + u↓2$, then $q < |spose 7q$. [{\sl Hint:} Strengthen
the proof of Theorem A by examining the influence of $v↓2.]$
\exno 20. [M22] Using the notation and assumptions of exercise
19, show that if $v↓2|spose 7q ≤ b|spose 7r + u↓2$, then $|spose
7q = q$ or $q = |spose 7q - 1$.
\exno 21. [M23] Show that if $v↓1 ≥ \lfloor b/2\rfloor $, and
if $v↓2|spose 7q ≤ b|spose 7r + u↓2$ but $|spose 7q ≠ q$ in
the notation of exercises 19 and 20, then $u - qv ≥ (1 - 3/b)v.
($The latter event occurs with approximate probability 3/$b$,
so that when $b$ is the word size of a computer we must have
$q↓j = |spose 7q$ in Algorithm D except in very rare circumstances.)
\exno 22. [24] Find an example of a four-digit number divided
by a three-digit number, using Algorithm D when the radix $b$
is 10, for which step D6 is necessary.
\exno 23. [M23] Given that $v$ and $b$ are integers, and that
1 ≤ $v < b$, prove that $\lfloor b/2\rfloor ≤ v\lfloor b/(v
+ 1)\rfloor < (v + 1)\lfloor b/(v + 1)\rfloor ≤ b$.
\exno 24. [M20] Using the law of the distribution of leading
digits explained in Section 4.2.4, give an approximate formula
for the probability that $d = 1$ in Algorithm D. (When $d =
1$, it is, of course, possible to omit most of the calculation
in steps D1 and D8.)
\exno 25. [26] Write a \MIX\ routine for step D1, which is needed
to complete Program D.
\exno 26. [21] Write a \MIX\ routine for step D8, which is needed
to complete Program D.
\exno 27. [M20] Prove that at the beginning of step D8 in Algorithm
D, the number $u↓{m+1}u↓{m+2} \ldotsm u↓{m+n}$ is always an exact
multiple of $d$.
\exno 28. [M30] (A. Svoboda, {\sl Stroje na Zpracov|spose 1an|spose
1i Informac|spose 1i \bf 9} (1963), 25--32.)\xskip Let $v = (v↓1v↓2
\ldotsm v↓n)↓b$ be any radix $b$ integer, where $v↓1 ≠ 0$. Perforem
the following operations:
(not really algstep↓↓↓)
\algstep N1. If $v↓1 < b/2$, multiply $v$ by \lfloor
($b + 1)/(v↓1 + 1)←$. Let the result of this step be $(v↓0v↓1v↓2
\ldotsm v↓n)↓b$.
\algstep N2. If $v↓0 = 0$, set $v ← v + (1/b)\lfloor b(b - v↓1)/(v↓1
+ 1)\rfloor v$; let the result of this step be $(v↓0v↓1v↓2 \ldotsm
v↓n.v↓{n+1} . . .)↓b$. Repeat step N2 until $v↓0 ≠ 0$.
|qleft \3skip |cancelindent Prove that step N2 will be performed
at most three times, and that we must always have $v↓0 = 1,
v↓1 = 0$ at the end of the calculations.
|qleft \qquad [{\sl Note{\sl : }}If $u$ and $v$ are both multiplied
by the above constants, we do not change the value of the quotient
$u/v$, and the divisor has been converted into the form (10$v↓2
\ldotsm v↓n.v↓{n+1}v↓{n+2}v↓{n+3})↓b$. This form of the divisor
may be very convenient because, in the notation of Algorithm
D, we may simply take $|spose 7q = u↓j$ as a trial divisor at
the beginning of step D3, or $|spose 7q = b - 1$ when $(u↓{j-1},
u↓j) = (1, 0).]$
\exno 29. [15] Prove or disprove: At the beginning of step D7
of Algorithm D, we always have $u↓j = 0$.
\exno 30. [22] If memory space is limited, it may be desirable
to use the same storage locations for both input and output
during the performance of some of the algorithms in this section.
Is it possible to have $w↓1, \ldotss , w↓n$ stored in the same
respective locations as $u↓1, \ldotss,u↓n$ or $v↓1, \ldotss ,
v↓n$ during Algorithm A or S? Is it possible to have $q↓0, \ldotss
, q↓m$ occupy the same locations as $u↓0, \ldotss , u↓m$ in Algorithm
D? Is there any permissible overlap of memory locations between
input and output in Algorithm M?
\exno 31. [28] Assume that $b = 3$ and that $u = (u↓1 \ldotsm
u↓{m+n})↓3, v = (v↓1 \ldotsm v↓n)↓3$ are integers in {\sl balanced
ternary} notation (cf.\ Section 4.1), $v↓1 ≠ 0$. Design a long-division
algorithm that divides $u$ by $v$, obtaining a remainder whose
absolute value does not exceed ${1\over 2}$ |$v|$. Try to find
an algorithm that would be efficient if incorporated into the
arithmetic circuitry of a balanced ternary computer.
\exno 32. [M40] Assume that $b = 2i$ and that $u$ and $v$ are
complex numbers expressed in the quarter-imaginary number system.
Design algorithms that divide $u$ by $v$, perhaps obtaining
a suitable remainder of some sort, and compare their efficiency.
{\sl References{\sl : }}M. Nadler, {\sl CACM \bf 4} (1961),
192--193; Z. Pawlak and A. Wakulicz, {\sl Bull.\ de l'Acad.\ Polonaise
des Sciences}, Classe III, {\bf 5} (1957), 233--236 (see also
pp. 803--804); and exercise 4.1--15.
\exno 33. [M40] Design an algorithm for taking square roots,
analogous to Algorithm D and to the pencil-and-paper method
for extracting square roots.
\exno 34. [40] Develop a set of computer subroutines for doing
the four arithmetic operations on arbitrary integers, putting
no constraint on the size of the integers except for the implicit
assumption that the total memory capacity of the computer should
not be exceeded. (Use linked memory allocation, so that no time
is wasted in finding room to put the results.)
\exno 35. [40] Develop a set of computer subroutines for ``decuple-precision
floating-point'' arithmetic, using excess 0, base $b$, nine-place
floating-point number representation, where $b$ is the computer
word size, and allowing a full word for the exponent. (Thus
each floating-point number is represented in 10 words of memory,
and all scaling is done by moving full words instead of shifting
within the words.)
\exno 36. [M42] Compute the values of the fundamental constants
listed in Appendix B to much higher precision than the 40-place
values listed there. ({\sl Note{\sl :} }The first 100,000 digits
of the decimal expansion of $π$ were published by D. Shanks
and J. W. Wrench, Jr., in {\sl Math.\ Comp.\ \bf 16} (1962), 76--99.)
|qleft \18skip |newtype {\:r α}/↓{\:r 4.3.2. Modular Arithmetic
|qleft }\6skip |newtype Another interesting alternative is available
for doing arithmetic on large integer numbers, based on some
simple principles of number theory. The idea is to have several
``moduli'' $m↓1, m↓2, \ldotss , m↓r$ that contain no common
factors, and to work indirectly with ``residues'' $u$ mod $m↓1,
u$ mod $m↓2 \ldotsm , u$ mod $m↓r$ instead of directly with the
number $u$.
For convenience in notation throughout this section, let
$$$u↓1 = u$ mod $m↓1,\qquad u↓2 = u$ mod $m↓2,\qquad \ldotss
,\qquad u↓r = u$ mod $m↓r.\eqno (1)$
|qctr \9skip It is easy to compute $(u↓1, u↓2, \ldotss , u↓r)$
from an integer number $u$ by means of division; and---more
important---no information is lost in this process, since we
can always recompute $u$ from $(u↓1, u↓2, \ldotss , u↓r)$ provided
that we know $u$ is not too large. For example, if 0 ≤ $u <
v ≤ 1000$, it is impossible to have $(u$ mod 7, $u$ mod 11,
$u$ mod 13) equal to $(v$ mod 7, $v$ mod 11, $v$ mod 13). This
is a consequence of the ``Chinese Remainder Theorem'' stated
below.
Therefore we may regard $(u↓1, u↓2, \ldotss , u↓r)$ as a new
type of internal computer representation, a ``modular representation,''
of the integer $u$.
|qleft \quad \xskip The advantages of a modular representation
are that addition, subtraction, and multiplication are very
simple:
$$$(u↓1, \ldotss , u↓r) + (v↓1, \ldotss, v↓r) = \biglp (u↓1 + v↓1)$mod
$m↓1, \ldotss , (u↓r + v↓r)$mod $m↓r\bigrp ,\eqno (2)$
|qctr \4skip (u$↓1, \ldotss , u↓r) - (v↓1, \ldotss , v↓r) = \biglp
(u↓1 - v↓1)$mod $m↓1, \ldotss , (u↓r \times v↓r)$mod $m↓r\bigrp
,\eqno (3)$
|qctr \4skip (u$↓1, \ldotss , u↓r) \times (v↓1, \ldotss , v↓r)
= \biglp (u↓1 \times v↓1)$mod $m↓1, \ldotss , (u↓r \times v↓r)$mod
$m↓r\bigrp .\eqno (4)$
|qctr \9skip It is easy to prove these formulas; for example,
to prove (4) we must show that $uv$ mod $m↓j = (u$ mod $m↓j)(v$
mod $m↓j)$mod $m↓j$ for each modulus $m↓j$. But this is a basic
fact of elementary number theory: $x$ mod $m↓j = y$ mod $m↓j$
if and only if $x ≡ y$ (modulo $m↓j)$; furthermore if $x ≡ x↑\prime$
and $y ≡ y↑\prime $, then $xy ≡ x↑\prime y↑\prime$ (modulo $m↓j)$;
hence $(u$ mod $m↓j)(v$ mod $|newtype$ W58320---
%folio 363 galley 7 (C) Addison-Wesley 1978
Computer Programming\quad (Knuth/Addision-Wesley)\quad f.363\quad
Ch.4\quad g.7b.
|qleft \20skip |newtype \quad \xskip The disadvantages of a
modular representation are that it is comparatively difficult
to test whether a number is positive or negative or to test
whether or not $(u↓1, \ldotss , u↓r)$ is greater than $(v↓1,
\ldotss,v↓r)$. It is also difficult to test whether or not overflow
has occurred as the result of an addition, subtraction, or multiplication,
and it is even more difficult to perform division. When these
operations are required frequently in conjunction with addition,
subtraction, and multiplication, the use of modular arithmetic
can be justified only if fast means of conversion into and out
of the modular representation are available. Therefore conversion
between modular and positional notation is one of the principal
topics of interest to us in this section.
The processes of addition, subtraction, and multiplication using
(2), (3), and (4) are called residue arithmetic or {\sl modular
arithmetic.} The range of numbers that can be handled by modular
arithmetic is equal to $m = m↓1m↓2 \ldotsm m↓r$, the product
of the moduli. Therefore we see that the amount of time required
to add, subtract, or multiply $n$-digit numbers using modular
arithmetic is essentially proportional to $n$ (not counting
the time to convert in and out of modular representation). This
is no advantage at all when addition and subtraction are considered,
but it can be a considerable advantage with respect to multiplication
since the conventional method of the preceding section requires
an execution time proportional to $n↑2$.
|qleft \quad \xskip Moreover, on a computer that allows many
operations to take place simultaneously, modular arithmetic
can be a significant advantage even for addition and subtraction;
the operations with respect to different moduli can all be done
at the ssme time, so we obtain a substantial increase in speed.
The same kind of decrease in execution time could not be achieved
by the conventional techniques discussed in the previous section,
since carry propagation must be considered. Perhaps some day
highly parallel computers will make simultaneous operations
commonplace, so that modular arithmetic will be of significant
importance in ``real-time'' calculations when a quick answer
to a single problem requiring high precision is needed. (With
highly parallel computers, it is often preferable to run {\sl
k separate} programs simultaneously, instead of running a {\sl
single} program $k$ times as fast, since the latter alternative
is more complicated but does not utilize the machine any more
efficiently; ``real-time'' calculations are exceptions that
make the inherent parallelism of modular arithmetic more significant.)
Now let us examine the basic fact that underlies the modular
representation of numbers:
\algbegin Theorem C (Chinese Remainder Theorem). {\sl Let
m↓1, m↓2, \ldotss , m↓r be positive integers that are relatively
prime in pairs, i.e.,}
$$gcd($m↓j, m↓k) = 1\qquad$ when\qquad $j ≠ k.\eqno (5)$
|qctr \9skip Let m = m$↓1m↓2 \ldotsm m↓r, and let a, u↓1, u↓2,
\ldotss , u↓r be integers. Then there is exactly one integer
u that satisfies the conditions$
$$$a ≤ u < a + m,\qquad$ and\qquad $u ≡ u↓j\quad$ (modulo $m↓j)\qquad$
for\qquad 1 ≤ $j ≤ r.\eqno (6)$
|qctr \9skip $Proof. \xskip$ If $u ≡ v$ (modulo $m↓j)$ for $1
≤ j ≤ r$, then $u - v$ is a multiple of $m↓j$ for all $j$, so
(5) implies that $u - v$ is a multiple of $m = m↓1m↓2 \ldotsm
m↓r$. This argument shows that there is {\sl at most} one solution
of (6). To complete the proof we must only show the existence
of {\sl at least} one solution, and this can be done in two
simple ways:
$$METHOD 1 (``Nonconstructive'' proof). \xskip As $u$ runs through
the $m$ distinct values $a ≤ u < a + m$, the $r$-tuples $(u$
mod $m↓1, \ldotss , u$ mod $m↓r)$ must also run through $m$ distinct
values, since (6) has at most one solution. But there are exactly
$m↓1m↓2 \ldotsm m↓r$ possible $r$-tuples $(v↓1, \ldotss , v↓r)$
such that 0 ≤ $v↓j < m↓j$. Therefore each $r$-tuple must occur
exactly once, and there must be some value of $u$ for which
($u$ mod $m↓1, \ldotss , u$ mod $m↓r) = (u↓1, \ldotss , u↓r)$.
|qleft \12skip METHOD 2 (``Consyructive'' proof). \xskip We
can find numbers $M↓j, 1 ≤ j ≤ r$, such that
$$$M↓j ≡ 1$ (modulo $m↓j ≡ 0$ (modulo $m↓k)\qquad$ for\qquad
$k ≠ j.\eqno (7)$
|qctr \9skip This follows because (5) implies that $m↓j$ and
$m/m↓j$ are relatively prime, so we may take
$$$M↓j = (m/m↓j)↑|≤'↑{(m}|infsup j↑)\eqno (8)$
|qctr \9skip by Euler's theorem (exercise 1.2.4--28). Now the
number
$$$u = a + \biglp (u↓1M↓1 + u↓2M↓2 + \cdots + u↓rM↓r - a)$mod
$m\bigrp \eqno (9)$
|qctr \9skip satisfies all the conditions of (6).
|qleft \12skip \quad \xskip A very special case of this theorem
was stated by the Chinese mathematician Sun-Ts|spose ??2u, who
gave a rule called t|spose 1ai-yen (``great generalization'');
the date of his writing is very uncertain, it is thought to
be between 280 and 473 |newtype A.D.|newtype [See Joseph Needham,
{\sl Science and Civilization in China \bf 3} (Cambridge University
Press, 1959), 33--34, for an interesting discussion.] Theorem
C was apparently first stated and proved in its proper generality
by Chhin Chiu-Shao in his {\sl Shu Shu Chiu Chang (1247).} Numerous
early contributions to this theory have been summarized by L.
E. Dickson in his {\sl History of the Theory of Numbers \bf 2}
(New York: Chelsea, 1952), 57--64.
As a consequence of Theorem C, we may use modular representation
for numbers in any consecutive interval of $m = m↓1m↓2 \ldotsm
m↓r$ integers. For example, we could take $a = 0$ in (6), and
work only with nonnegative integers $u$ less than $m$. On the
other hand, when addition and subtraction are being done, as
well as multiplication, it is usually most convenient to assume
that all the moduli $m↓1, m↓2, \ldotss , m↓r$ are odd numbers,
so that $m = m↓1m↓2 \ldotsm m↓r$ is odd, and to work with integers
in the range
$$$- {m\over 2} < u < {m\over 2},\eqno (10)$
|qctr \9skip which is completely symmetrical about zero.
To perform the basic operations indicated in (2), (3), and (4),
we need to compute $(u↓j + v↓j)$mod $m↓j, (u↓j - v↓j)$mod $m↓j$,
and $u↓jv↓j$ mod $m↓j$, when 0 ≤ $u↓j, v↓j < m↓j$. If $m↓j$
is a single-precision number, it is most convenient to form
$u↓jv↓j$ mod $m↓j$ by doing a multiplication and then a division
operation. For addition and subtraction, the situation is a
little simpler, since no division is necessary; the following
formulas may conveniently be used;
|qleft \9skip $(u↓j + v↓j)$mod $m↓j |tab = \left\{{u↓j + v↓j,\qquad
\xskip \atop u↓j + v↓j - m↓j},\qquad$ ${if\qquad u↓j + v↓j <
m↓j;\atop$ if\qquad $u↓j + v↓j + v↓j ≥ m↓j.}$\eqno (11)⊗\4skip
\hfill (u$↓j - v↓j)$mod $m↓j ⊗= \left\{{u↓j - v↓j,\qquad \xskip
\atop u↓j - v↓j + m↓j},\qquad$ ${if\qquad u↓j - v↓j ≥ 0;\atop$
if\qquad $u↓j - v↓j < 0.}$\eqno (12)\cr
\9skip |newtype (Cf.\ Section 3.2.1.1.) In this case, since we
want $m$ to be as large as possible, it is easiest to let $m↓1$
be the largest odd number that fits in a computer word, to let
$m↓2$ be the largest odd number < $m↓1$ that is relatively prime
to $m↓1$, to let $m↓3$ be the largest odd number < $m↓2$ that
is relatively prime to both $m↓1$ and $m↓2$, and so on until
enough $m↓j$'s have been found to give the desired range $m$.
Efficient ways to determine whether or not two integers are
relatively prime are discussed in Section 4.5.2.
As a simple example, suppose that we have a decimal computer
with a word size of only 100. Then the procedure described in
the previous paragraph would give
$$$m↓1 = 99,\quad m↓2 = 97,\quad m↓3 = 95,\quad m↓4 = 91,\quad
m↓5 = 89,\quad m↓6 = 83,\eqno (13)$
|qctr \9skip and so on.
On binary computers it is sometimes desirable to choose the
$m↓j$ in a different way, by selecting
$$$m↓j = 2↑e|infsup j - 1.\eqno (14)$
|qctr \9skip In other words, each modulus is one less than a
power of 2. Such a choice of $m↓j$ often makes the basic arithmetic
operations simpler, because it is relatively easy to work modulo
2$↑e|infsup j - 1$, as in ones' complement arithmetic. When
the moduli are chosen according to this strategy, it is helpful
to relax the condition $0 ≤ u↓j < m↓j$ slightly, so that we
require only
$$$0 ≤ u↓j < 2↑e|infsup j,\qquad u↓j ≡ u|newtype$ W58320---Computer
%folio 366 galley 8 (C) Addison-Wesley 1978
programming\quad (Knuth/Addision-Wesley)\quad f.366\quad Ch.4.\quad
G.8b.
|qleft \20skip |newtype Thus, the value $u↓j = m↓j = 2↑e|infsup
j - 1$ is allowed as an optional alternative to $u↓j = 0$, since
this does not affect the validity of Theorem C, and it means
we are allowing $u↓j$ to be any $e↓j$-bit binary number. Under
this assumption, the operations of addition and multiplication
modulo $m↓j$ become the following:
$$$u↓j \oplus v↓j |tab = \left\{{u↓j + v↓j,\qquad \qquad \atop
\biglp (u↓j + v↓j)$mod 2$↑e|infsup j\bigrp + 1},\qquad$ ${if\qquad
u↓j + v↓j < 2↑e|infsup j;\atop$ if\qquad $u↓j + v↓j ≥ 2↑e|infsup
j.}$\eqno (16)⊗\4skip \hfill u$↓j \otimes v↓j ⊗= (u↓jv↓j$ mod
2$↑e|infsup j) \oplus \lfloor u↓jv↓j/2↑e|infsup j\rfloor .\eqno
(17)\cr
\9skip$ [Here \oplus and \otimes refer to the operations to
be done on the individual components of $(u↓1, \ldotss, u↓r)$
and $(v↓1, \ldotss , v↓r)$ when adding or multiplying, respectively,
using the convention (15).] Equation (12) may be used for subtraction.
Clearly, these operations can be readily performed even when
$m↓j$ is larger than the computer's word size; it is a simple
matter to compute the remainder of a positive number modulo
a power of 2, or to divide a number by a power of 2. In (17)
we have the sum of the ``upper half'' and the ``lower half''
of the product, as discussed in exercise 3.2.1.1--8.
If moduli of the form 2$↑e|infsup j - 1$ are to be used, we
must know under what conditions the number $2↑e - 1$ is relatively
prime to the number 2$↑f - 1$. Fortunately, there is a very
simple rule,
$$$$gcd(2$↑e - 1, 2↑f - 1) = 2$$↑{gcd(e,f)} - 1,\eqno (18)$
|qctr \9skip which states in particular that 2$↑e - 1 and 2↑f
- 1 are relatively prime if and only if e and f are relatively
prime$. Equation (18) follows from Euclid's algorithm and the
identity
$$$(2↑e - 1)$mod(2$↑f - 1) = 2↑$${emodf} - 1.\eqno (19)$
|qctr \9skip (See exercise 6.) Thus we could choose for example
$m↓1 = 2↑{35} - 1, m↓2 = 2↑{34} - 1, m↓3 = 2↑{33} - 1, m↓4 =
2↑{31} - 1, m↓5 = 2↑{29} - 1$, if we had a computer with word
size 2$↑{35} and if we wanted to represent numbers ???(See exercise
6.) Thus we could choose for example m↓1 = 2↑{35} - 1, m↓2 =
2↑{34} - 1, m↓3 = 2↑{33} - 1, m↓4 = 2↑{31} - 1, m↓5 = 2↑{29}
- 1$, if we had a computer with word size 2$↑{35} and if we
wanted to represent numbers up to m↓1m↓2m↓3m↓4m↓5 > 2↑{161}$.
This range of integers is not big enough to make modular arithmetic
faster than the conventional method, and we usually find that
modular arithmetic using convention (15) is advantageous only
when the $m↓j$ are larger than the word size or when division
is inconvenient.
As we have already observed, the operations of conversion to
and from modular representation are very important. If we are
given a number $u$, its modular representation $(u↓1, \ldotss
, u↓r)$ may be obtained by dividing $u$ by $m↓1, \ldotss , m↓r$
and saving the remainders. A possibly more attractive procedure,
if $u = (v↓mv↓{m-1} \ldotsm v↓0)↓b$, is to evaluate the polynomial
$$$(\cdots (v↓mb + v↓{m-1})b +\cdots)b + v↓$
0|qctr \9skip using modular arithmetic. When $b = 2$ and when
the modulus $m↓j$ has the special form 2$↑2|infsup j - 1$, both
of these methods reduce to quite a simple procedure:
|qleft Consider the binary representation of $u$ with blocks
of $e↓j$ bits grouped together,
$$$u = a↓tA↑t + a↓{t-1}A↑{t-1} + \cdots + a↓1A + a↓0,\eqno
(20)$
|qctr \9skip where $A = 2↑e|infsup j$ and 0 ≤ $a↓k < 2↑e|infsup
j$ for $0 ≤ k ≤ t$. Then
$$$u ≡ a↓t + a↓{t-1} + \cdots + a↓1 + a↓0\quad$ (modulo 2$↑e|infsup
j - 1),\eqno (21)$
|qctr \9skip since $A ≡ 1$. Therefore we may obtain $u↓j$ by
adding the $e↓j$-bit numbers $a↓t \oplus \cdots \oplus a↓1 \oplus
a↓0$, modulo 2$↑e|infsup j - 1$, as in Eq.\ (16). This process
is similar to the familiar device of ``casting out nines'' that
is used to determine $u$ mod 9 when $u$ is expressed in the
decimal system.
Conversion back from modular form to positional notation is
somewhat more difficult. It is interesting in this regard to
make a few side remarks about the way computers make us change
our viewpoint towards mathematical proofs: Theorem C tells us
that the conversion from $(u↓1, \ldotss , u↓r)$ to $u$ is possible,
and two proofs are given. The first proof we considered is a
classical one that makes use only of very simple concepts,
namely the facts that
$$|indent i) any number that is a multiple of $m↓1$ and of
$m↓2, \ldotss $, and of $m↓r$, must be a multiple of $m↓1m↓2
\ldotsm m↓r$ when the $m↓j$'s are pairwise relatively prime;
and
|qleft ii) if $m$ things are put into $m$ boxes with no two
things in the same box, then there must be one in each box.
|qleft \12skip |cancelindent By traditional notions of mathematical
aesthetics, this is no doubt the nicest proof of Theorem C;
but from a computational standpoint it is completely worthless.
It amounts to saying, ``Try $u = a, a + 1, . . $. until you
find a value for which $u ≡ u↓1$ (modulo $m↓1), \ldotss , u ≡
u↓r$ (modulo $m↓r).$''
|qleft \quad \xskip The second proof of Theorem C is more explicit;
it shows how to compute $r$ new constants $M↓1, \ldotss , M↓r$,
and to get the solution in terms of these constants by formula
(9). This proof uses more complicated concepts (for example,
Euler's theorem), but it is much more satisfactory from a computational
standpoint, since the constants $M↓1, \ldotss , M↓r$ need to
be determined only once. On the other hand, the determination
of $M↓j$ by Eq.\ (8) is certainly not trivial, since the evaluation
of Euler's $\varphi $-function requires, in general, the factorization
of $m↓j$ into prime powers. Furthermore, $M↓j$ is likely to
be a terribly large number, even if we compute only the quantity
$M↓j$ mod $m$ (which will work just as well as $M↓j$ in (9)).
Since $M↓j$ mod $m$ is uniquely determined if (7) is to be satisfied
(because of the Chinese Remainder Theorem), we can see that,
in any event, Eq.\ (9) requires a lot of high-precision calculation,
and such calculation is just what we wished to avoid by modular
arithmetic in the first place.
So we need an even {\sl better} proof of Theorem C if we are
going to have a really usable method of conversion from $(u↓1,
\ldotss , u↓r)$ to $u$. Such a method was suggested by H. L.
Garner in 1958; it can be carried out using $(↑{r}↓{2})$ constants
$c↓{ij}$ for 1 ≤ $i < j ≤ r$, where
$$$c↓{ij}m↓i ≡ 1\quad$ (modulo $m↓j).\eqno (22)$
|qctr \9skip These constants $c↓{ij}$ are readily computed using
Euclid's algorithm, since Algorithm 4.5.2X determines $a, b$
such that $am↓i + bm↓j =$ gcd($m↓i, m↓j) = 1$ and we may take
$c↓{ij} = a$. When the moduli have the special form e$↑e|infsup
j - 1$, a simple method of determining $c↓{ij}$ is given in
exercise 6.
|qleft \xskip Once the $c↓{ij}$ have been determined satisfying
(22), we can set
$$$\qquad \quad v↓1 ← u↓1$ mod $m↓1$,
$$\qquad \quad v$↓2 ← (u↓2 - v↓1)c↓{12}$ mod $m↓2$,
$$\qquad \quad v$↓3 ← \biglp (u↓3 - v↓1)c↓{13} - v↓2\bigrp c↓{23}$
mod $m↓3,\eqno (23)$
|qleft \4skip \qquad \quad \cdot \cdot \cdot
|qleft \4skip \qquad \quad v$↓r ← (\ldotss \biglp (u↓r - v↓1)c↓{1r}
- v↓2\bigrp c↓{2r} - \cdots - v↓{r-1}\bigrp c↓{(r-1)r}$ mod
$m↓r$.
|qleft \6skip Then
$$$u = v↓rm↓{r-1} \ldotss m↓1 + \cdots + v↓3m↓2m↓1 + v↓2m↓1 +
v↓1\eqno (24)$
|qctr \9skip is a number satisfying the conditions
$$$0 ≤ u < m,\qquad u ≡ u↓j\quad$ (modulo $m↓j),\qquad 1 ≤ j
≤ r.\eqno (25)$
|qctr \9skip (See exercise 8; another way of rewriting (23)
that does not involve as many auxiliary constants is given
in exercise 7.) Equation (24) is a {\sl mixed-radix representation}
of $u$, which may be converted to binary or decimal notation
using the methods of Section 4.4. If 0 ≤ $u < m$ is not the
desired range, an appropriate multiple of $m$ can be added or
subtracted after the conversion process.
The advantage of the computation shown in (23) is that the calculation
of $v↓j$ can be done using only arithmetic mod $m↓j$, which
is already built into the modular arithmetic algorithms. Furthermore,
(23) allows parallel computation: We can start with $(v↓1, \ldotss
, v↓r) ← (u↓1$ mod $m↓1, \ldotss , u↓r$ mod $m↓r)$, then at time
$j$ for $1 ≤ j < r$ we simultaneously set $v↓k ← (v↓k - v↓j)c↓{jk}$
mod $m↓k$ for $j < k ≤ r$. An alternative way to compute the
mixed-radix representation, allowing similar possibilities for
parallelism, has been discussed by A. S. Fraenkel, {\sl Proc.\
ACM Nat.\ Conf.\ \bf 19}$ ($Philadelphia, 1965), E1.4.
It is important to observe that the mixed-radix representation
(24) is sufficient to compare the magnitudes of two modular
numbers. For if we know that $0 ≤ u < m$ and $0 ≤ u↑\prime <
m$, then we can tell if $u < u↑\prime$ by first doing the conversion
to $v↓1, \ldotss , v↓r$ and $v↑{↑\prime}↓{1}, \ldotss , v↑{↑\prime}↓{r}$,
then testing if $v↓r < v↑{↑\prime}↓{r}$, or if $v↓r = v↑{↑\prime}↓{r}$
and $v↓{r-1} < v↑{↑\prime}↓{r-1}$, etc. It is not necessary
to convert all the way to binary or decimal notation if we only
want to know whether $(u↓1, \ldotss , u↓r)$ is less than $(u↑{↑\prime}↓{1},
\ldotss , u↑{↑\prime}↓{r})$.
|qleft \quad \xskip The operation of comparing two numbers,
or of deciding if a modular number is negative, is intuitively
very simple, so we would expect to find a much easier method
for making this test than the conversion to mixed-radix form.
But the following theorem shows that there is little hope of
finding a substantially easier method, since the range of a
modular number depends essentia|newtype W58320---Computer programming\quad
(Knuth/A
%folio 370 galley 9 (C) Addison-Wesley 1978
ddision-Wesley)\quad f.370\quad Ch.4\quad G.9b.
\algbegin Theorem S. ({\rm Nicholas Szab|spose
1o, 1961}). {\sl In terms of the notation above, assume that
m↓1 < |newtype |newtype \sqrt{m}, and let L be any value in
the range}
$$m$↓1 ≤ L ≤ m - m↓1.\eqno (26)$
|qctr \9skip {\sl Let g be any function such that the set \{g(0),
g(1), \ldotss , g(m↓1 - 1)\} contains fewer than m↓1 values. Then
there are numbers u and v such that}
$$g(u mod $m↓1) = g(v$ mod $m↓1),\qquad u$ mod $m↓j = v$ mod
$m↓j\qquad$ for\qquad 2 ≤ $j ≤ r;\eqno (27)$
|qctr \9skip 0 ≤ u < L ≤ v < m.\eqno (28)
|qctr \9skip {\sl Proof. \xskip} By hypothesis, there must exist
numbers $u ≠ v$ satisfying (27), since $g$ must take on the
same value for two different residues. Let $(u, v)$ be a pair
of values with 0 ≤ $u < v < m$ satisfying (27), for which $u$
is a minimum. Since $u↑\prime = u - m↓1$ and $v↑\prime = v -
m↓1$ also satisfy (27), we must have $u↑\prime < 0$ by the minimality
of $u$. Hence $u < m↓1 ≤ L$; and if (28) does not hold, we must
have $v < L$. But $v > u$, and $v - u$ is a multiple of $m↓2
\ldotsm m↓r = m/m↓1$, so $v ≥ v - u ≥ m/m↓1 > m↓1$. Therefore,
if (28) does not hold for $(u, v)$, it will be satisfied for
the pair $(u\dprime , v\dprime ) = (v - m↓1, u + m - m↓1)$.
|qleft \12skip \quad \xskip Of course, a similar result can
be proved for any $m↓j$ in place of $m↓1$; and we could also
replace (28) by the condition $``a ≤ u < a + L ≤ v < a + m$''
with only minor changes in the proof. Therefore Theorem S shows
that many simple functions cannot be used to determine the range
of a modular number.
Let us now reiterate the main points of the discussion in this
section: Modular arithmetic can be a significant advantage for
applications in which the predominant calculations involve exact
multiplication (or raising to a power) of large integers, combined
with addition and subtraction, but where there is very little
need to divide or compare numbers, {\sl or to test whether intermediate
results ``overflow'' out of range. (}It is important not to
forget the latter restriction; methods are available to test
for overflow, as in exercise 12, but they are in general so
complicated that they nullify the advantages of modular arithmetic.)
Several applications for modular computations have been discussed
by H. Takahasi and Y. Ishibashi, {\sl Information Proc.\
in Japan \bf 1} (1961), 28--42.
|qleft \quad \xskip An example of such an application is the
exact solution of linear equations with rational coefficients.
For various reasons it is desirable in this case to assume that
the moduli $m↓1, m↓2, \ldotss , m↓r$ are all large prime numbers;
the linear equations can be solved independently modulo each
$m↓j$. A detailed discussion of this procedure has been given
by I. Borosh and A. S. Fraenkel [{\sl Math.\ Comp.\ \bf 20} (1966),
107--112]. By means of their method, the nine independent solutions
of a system of 111 linear equations in 120 unknowns were obtained
exactly in less than one hour's running time on a CDC 1604 computer.
The same procedure is worth while also for solving simultaneous
linear equations with floating-point coefficients, when the
matrix of coefficients is ill-conditioned. The modular technique
(treating the given floating-point coefficients as exact rational
numbers) gives a method for obtaining the {\sl true} answers
in less time than conventional methods can produce reliable
{\sl approximate} answers! [See M. T. McClellan, {\sl JACM \bf 20}
(1973), 563--588, for further developments of this approach;
and see also $$E. H. Bareiss, {\sl J. Inst.\ Math.\ and Appl.\
\bf 10} (1972), 68--104 for a discussion of its limitations.]
The published literature concerning modular arithmetic is mostly
oriented towards hardware design, since the carry-free properties
of modular arithmetic make it attractive from the standpoint
of high-speed operation. The idea was first published by A.
Svoboda and M. Valach in the Czechoslovakian journal {\sl Stroje
na Zpracov|spose 1an|spose 1i Informac|spose 1i \bf 3} (1955),
247--295; then independently by H. L. Garner [{\sl IRE Trans.\
\bf EC--8} (1959), 140--147]. The use of moduli of the form
$2↑e|infsup j - 1$ was suggested by A. S. Fraenkel [{\sl JACM
\bf 8} (1961), 87--96], and several advantages of such moduli
were demonstrated by A. Sch|spose 4onhage [{\sl Computing \bf 1}
(1966), 182--196]. See the book {\sl Residue Arithmetic and
its Applications to Computer Technology} by N. S. Szab|spose
1o and R. I. Tanaka (New York: McGraw-Hill, 1967), for additional
information and a comprehensive bibliography of the subject.
Further discussion of modular arithmetic can be found in part
B of Section 4.3.3.
|qleft \24skip {\:r EXERCISES
\exno 1. [20] Find all
integer numbers $u$ that satisfy the conditions $u$ mod 7 =
1, $u$ mod 11 = 6, $u$ mod 13 = 5, and 0 ≤ $u < 1000$.
\exno 2. [M20] Would Theorem C still be true if we allowed $a,
u↓1, u↓2, \ldotss , u↓r$ and $u$ to be arbitrary real numbers
(not just integers)?
\exno 3. [M26] ({\sl Generalized Chinese Remainder Theorem.)\xskip
Let $m↓1, m↓2, \ldotss , m↓r$ be positive integers. Let $m$ be
the least common multiple of $m↓1, m↓2, \ldotss , m↓r$, and let
$a, u↓1, u↓2, \ldotss , u↓r$ be any integers. Prove that there
is exactly one integer $u$ that satisfies the conditions
$$$a ≤ u < a + m,\qquad u ≡ u↓j\quad$ (modulo $m↓j),\qquad 1
≤ j ≤ r$,
|qctr \9skip provided that
$$$u↓i ≡ u↓j\quad$ (modulo gcd($m↓i, m↓j)\bigrp ,\qquad 1 ≤
i < j ≤ r$;
|qctr \9skip |newtype and there is no such integer $u$ when
the latter condition fails to hold.
\exno 4. [20] Continue the process shown in (13); what would
$m↓7, m↓8, m↓9, m↓{10}$ be?
\exno 5. [M23] Suppose that the method of (13) is continued
until no more $m↓j$ can be chosen; does this method give the
largest attainable value $m↓1m↓2 \ldotsm m↓r$ such that the $m↓j$
are odd positive integers less than 100 that are relatively
prime in pairs?
\exno 6. [M22] Let $e, f, g$ be nonnegative integers.\xskip (a) Show
that $2↑e ≡ 2↑f$ (modulo $2↑g - 1)$ if and only if $e ≡ f$ (modulo
$g)$.\xskip (b) Given that $e$ mod $f = d$ and $ce$ mod $f = 1$, prove
that
$$$|newtype (|newtype (1 + 2↑d + \cdots + 2↑{(c-1)d}↓{) \cdot
(2↑e - 1)|newtype )|newtype$ mod (2$↑f - 1) = 1$.
|qctr \9skip [Thus, we have a comparatively simple formula for
the inverse of $2↑e - 1$, modulo $2↑f - 1$, as required in (22).]
\exno 7. [M21] Show that (23) can be rewritten as follows:
$$$\quad v↓1 ← u↓1$ mod $m↓1$,
$$\quad v$↓2 ← (u↓2 - v↓1)c↓{12}$ mod $m↓2$,
$$\quad v$↓3 ← \biglp u↓3 - (v↓1 + m↓1v↓2)\bigrp c↓{13}c↓{23}$
mod $m↓3$,
|qleft \quad \cdot \cdot \cdot
|qleft \quad v$↓r ← \biglp u↓r - (v↓1 + m↓1(v↓2 + m↓2(v↓3 +
\cdots + m↓{r-2}v↓{r-1}) . . .))\bigrp c↓{1r} \ldotsm c↓{(r-1)r}$
mod $m↓r$.
|qleft \9skip |newtype If the formulas are rewritten in this
way, we see that only $r - 1$ constants $C↓j = c↓{1j} \ldotsm
c↓{(j-1)j}$ mod $m↓j$ are needed instead of $r(r - 1)/2$ constants
$c↓{ij}$ as in (23). Discuss the relative merits of this version
of the formula as compared to (23), from the standpoint of computer
calculation.
\exno 8. [M21] Prove that the number $u$ defined by (23) and
(24) satisfies (25).
\exno 9. [M20] Show how to go from the values $v↓1, \ldotss ,
v↓r$ of the mixed-radix notation (24) back to the original residues
$u↓1, \ldotss , u↓r$, using only arithmetic mod $m↓j$ to compute
$u↓j$.
\exno 10. [M25] An integer $u$ that lies in the symmetrical
range (10) might be represented by finding the numbers $u↓1,
\ldotss , u↓r$ such that $u ≡ u↓j$ (modulo $m↓j)$ and $-m↓j/2
< u↓j < m↓j/2$, instead of insisting that 0 ≤ $u↓j < m↓j$ as
in the text. Discuss the modular arithmetic procedures that
would be used in this case (including the conversion process,
(23)\bigrp .
\exno 11. [M23] Assume that all the $m↓j$ are odd, and that
$u = (u↓1, \ldotss , u↓r)$ is known to be even, where $0 ≤ u
< m$. Find a reasonably fast method to compute $u/2$ using modular
arithmetic.
\exno 12. [M10] Prove that, if 0 ≤ $u, v < m$, the modular addition
of $u$ and $v$ causes overflow (i.e., is outside the range allowed
by the modular representation) if and only if the sum is less
than $u$. (Thus the overflow detection problem is equivalent
to the comparison problem.)
\exno 13. [M25] ({\sl Automorphic numbers.})\xskip An $n$-place decimal
number $x > 1$ is called an ``automorph'' by recreational mathematicians
if the last $n$ digits of $x↑2$ are equal to $x$; i.e., if $x↑2$
mod 10$↑n = x. [$See {\sl Scientific American \bf 218} (January,
1968), 125.] For example, 9376 is a 4-place automorph, since
9376$↑2 = 87909376$.
\yskip\hang\textindent{a)}Prove that an $n$-place number $x > 1$ is
an automorph if and only if $x$ mod 5$↑n = 0$ or 1, and $x$
mod 2$↑n = 1$ or 0, respectively. [Thus, if $m↓1 = 2↑n$ and
$m↓2 = 5↑n$, the only two $n$-place automorphs are the numbers
$M↓1$ and $M↓2$ in (7).]
\hang\textindent{b)}Prove that if $x$ is an $n$-place automorph,
then (3$x↑2 - 2x↑3)$mod 10$↑{2n}$ is a $2n$-place automorph.
\hang\textindent{c)}Given that $c\chi ≡ 1$ (modulo $y)$, what
is a simple formula for a number $c↑\prime$ such that $c↑\prime
\chi ↑2 ≡ 1$ (modulo $y↑2)?$
|qleft
%folio 372 galley 10 (C) Addison-Wesley 1978
|newtype W58320---Computer Programming\quad (Knuth/Addision-Wesley)\quad
F.372\quad Ch.4\quad G.10b.
|qleft \20skip |newtype {\:r α}/↓{\:r 4.3.3. How Fast Can We
Multiply?
|qleft }\6skip The conventional method for multiplication, Algorithm
4.3.1M, requires approximately {\sl cmn} operations to multiply
an $m$-digit number by an $n$-digit number, where $c$ is a constant.
In this section, let us assume for convenience that $m = n$,
and let us consider the following question: {\sl Does every
general computer algorithm for multiplying two n-digit numbers
require an execution time proportional to n↑2, as n increases{\sl
?}}
|qleft \quad \xskip (In this question, a ``general'' algorithm
means one that accepts, as input, the number $n$ and two arbitrary
$n$-digit numbers in positional notation, and whose output is
their product in positional form. Certainly if we were allowed
to choose a different algorithm for each value of $n$, the question
would be of no interest, since multiplication could be done
for any specific value of $n$ by a ``table-lookup'' operation
in some huge table. The term ``computer algorithm'' is meant
to imply an algorithm that is suitable for implementation on
a digital computer such as \MIX, and the execution time is to
be the time it takes to perform the algorithm on such a computer.)
\subsectionbegin{A. Digital methods} The answer to
the above question is, rather surprisingly, ``No,'' and, in
fact, it is not very difficult to see why. For convenience,
let us assume throughout this section that we are working with
integers expressed in binary notation. If we have two 2$n$-bit
numbers $u = (u↓{2n-1} \ldotsm u↓1u↓0)↓2$ and $v = (v↓{2n-1}
\ldotsm v↓1v↓0)↓2$, we can write
$$$u = 2↑nU↓1 + U↓0,\qquad v = 2↑nV↓1 + V↓0,\eqno (1)$
|qctr \9skip where $U↓1 = (u↓{2n-1} \ldotsm u↓n)↓2$ is the ``most-significant
half'' of $u$ and $U↓0 = (u↓{n-1} \ldotsm u↓0)↓2$ is the ``least-significant
half''; and similarly $V↓1 = (v↓{2n-1} \ldotsm v↓n)↓2, V↓0 =
(v↓{n-1} \ldotsm v↓0)↓2$. Now we have
$$$uv = (2↑{2n} + 2↑n)U↓1V↓1 + 2↑n(U↓1 - U↓0)(V↓0 - V↓1) \times
(2↑n + 1)U↓0V↓0.\eqno (2)$
|qctr \9skip This formula reduces the problem of multiplying
$2n$-bit numbers to three multiplications of $n$-bit numbers,
$U↓1V↓1, (U↓1 - U↓0)(V↓0 - V↓1)$, and $U↓0V↓0$, plus some simple
shifting and adding operations.
Formula (2) can be used for double-precision multiplication
when a quadruple-precision result is desired, and it is just
a little faster than the traditional method on many machines.
It is more important to observe that we can use formula (2)
to define a recursive process for multiplication that is significantly
faster than the familiar order-$n↑2$ method when $n$ is large:
If $T(n)$ is the time required to perform multiplication of
$n$-bit numbers, we have
$$????????????\9skip $T(2n) ≤ 3T(n) + cn\eqno (3)$
|qctr \9skip for some constant $c$, since the right-hand side
of (2) uses just three multiplications plus some additions and
shifts. Relation (3) implies by induction that
$$$T(2↑k) ≤ c(3↑k - 2↑k),\qquad k ≥ 1,\eqno (4)$
|qctr \9skip if we choose $c$ to be large enough so that this
inequality is valid when $k = 1$; and therefore we have
$$$$
|qctr \hfill $T(n) ≤ T(2↑|"p$$↑{lgn|}"P) ≤ c(3↑|"p$$↑{lgn|}"P
- 2↑|"p$$↑{lgn}\rceil )\cr
\4skip ⊗ ≤ 3c \cdot 3$$↑{lgn} = 3cn$$↑{lg3}.\eqno (5)\cr
\9skip$ Relation (5) shows that the running time for multiplication
can be reduced from order $n↑2$ to order $n$$↑{lg3} \approx
n↑{1.585}$, and of course this is a much faster algorithm when
$n$ is large.
(A similar but more complicated method for doing multiplication
with running time of order $n$$↑{lg3} was apparently first suggested
by A. Karatsuba and xxxIU. Ofman, Doklady Akad.\ Nauk SSSR \bf 145}
(1962), 293--294. Curiously, this idea does not seem to have
been discovered before 1962; none of the ``calculating prodigies''
who have become famous for their ability to multiply large numbers
mentally have been reported to use any such method, although
formula (2) adapted to decimal notation would seem to lead to
a reasonably easy way to multiply eight-digit numbers in one's
head.)
The running time can be reduced still further, in the limit
as $n$ approaches infinity, if we observe that the method just
used is essentially the special case $r = 1$ of a more general
method that yields
$$$T\biglp (r + 1)n\bigrp ≤ (2r + 1)T(n) + cn\eqno (6)$
|qctr \9skip |newtype for any fixed $r$. This more general method
can be obtained as follows: Let
$$$u = (u↓{(r+1)n-1} \ldotsm u↓1u↓0)↓2\qquad$ and\qquad $v =
(v↓{(r+1)n-1} \ldotsm v↓1v↓0)↓$
2|qctr \9skip be broken into $r + 1$ pieces,
$$$u = U↓r2↑{rn} + \cdots + U↓12↑n + U↓0,\qquad v = V↓r2↑{rn}
+ \cdots + V↓12↑n + V↓0,\eqno (7)$
|qctr \9skip where each $U↓j$ and each $V↓j$ is an $n$-bit number.
Consider the polynomials
$$$U(x) = U↓rx↑r + \cdots + U↓1x + U↓0,\qquad V(x) = V↓rx↑r
+ \cdots + V↓1x + V↓0,\eqno (8)$
|qctr \9skip and let
$$$W(x) = U(x)V(x) = W↓{2r}x↑{2r} + \cdots + W↓1x + W↓0.\eqno
(9)$
|qctr \9skip Since $u = U(2↑n)$ and $v = V(2↑n)$, we have $uv
= W(2↑n)$, so we can easily compute $uv$ if we know the coefficients
of $W(x)$. The problem is to find a good way to compute the
coefficients of $W(x)$ by using only $2r + 1$ multiplications
mxf ????????????and $v = V(2↑n)$, we have $uv = W(2↑n)$, so
we can easily compute $uv$ if we know the coefficients of $W(x)$.
The problem is to find a good way to compute the coefficients
of $W(x)$ by using only $2r + 1$ multiplications of $n$-bit
numbers plus some further operations that involve only an execution
time proportional to $n$. This can be done by computing
$$$U(0)V(0) = W(0),\qquad U(1)V(1) = W(1),\qquad \ldotss ,\qquad
U(2r)V(2r) = W(2r).\eqno (10)$
|qctr \9skip The coefficients of a polynomial of degree $2r$
can be written as a linear combination of the values of that
polynomial at $2r + 1$ distinct points; such a linear combination
requires an execution time at most proportional to $n$. (Actually,
the products $U(j)V(j)$ are not strictly products of $n$-bit
numbers, but they are products of at most $(n + t)$-bit numbers,
where $t$ is a fixed value depending on $r$. It is easy to design
a multiplication routi\quad \xskip Relation (6) can be used
to show that $T(n) ≤ c↓3n$$↑{log}|infsup r|infsup +|infsup 1↑{(2r+1)}
< c↓3n↑$${1+log}|infsup r|infsup +|infsup 1↑2$, using a method
analogous to the derivation of (5), so we have now proved:
\thbegin Theorem A. {\sl Given $ε > 0$, there exists a constant
c(ε) and a multiplication algorithm such that the number of
elementary operations T(n) needed to multiply two n-bit numbers
satisfies}
$$T(n) < c(ε)n$↑{1+|}≤e.\eqno (11)$
|qctr \9skip \quad \xskip This theorem is still not the result
we are after. It is unsatisfactory for practical purposes in
that the method becomes much more complicated as $ε → 0$ (and
therefore as $r → ∞)$, causing $c(ε)$ to grow so rapidly that
extremely huge values of $n$ are needed before we have any significant
improvement over (5). And it is unsatisfactory for theoretical
purposes because it does not make use of the full power of the
polynomial method on which it is based. We can obtain a better
result if we let {\sl r vary} with $n$, choosing larger and
larger values of $r$ as $n$ increases. This idea is due to A.
L. Toom [{\sl Doklady Akad.\ Nauk SSSR \bf 150} (1963), 496--498;
tr.\ into English in {\sl Soviet Mathematics \bf 3} (1963), 714--716],
who used it to show that computer circuitry for multiplication
of $n$-bit numbers can be constructed involving a fairly small
number of components as $n$ grows. S. A. Cook [{\sl On the minimum
computation time of functions (}Thesis, Harvard University,
1966), 51--77] later showed how Toom's method can be adapted
to fast computer programs.
Before we discuss the Toom-Cook algorithm any further, let us
study a small example of the transition from $U(x)$ and $V(x)$
to the coefficients of $W(x)$. This example will not demonstrate
the efficiency of the method, since the numbers are too small,
but it points out some useful simplifications that we can make
in the general case. Suppose that we want to multiply $u = 1234$
times $v = 2341$; in binary notation this is $u =|newtype$ W58320---Computer
%folio 376 galley 11 WARNING: Some bad spots on this tape. (C) Addison-Wesley 1978
Programming\quad (Knuth/Addision-Wesley)\quad f.376\quad Ch.4\quad
G.11b.
|qleft \20skip |newtype Hence we find, for $W(x) = U(x)V(x)$,
$$
|qctr \hfill U(0) ⊗=\hfill 2,\quad U(1) ⊗=\hfill 19,\quad U(2)
⊗=\hfill 44,\quad U(3) ⊗=\hfill 77,\quad U(4) ⊗= 118;\cr
\4skip \hfill V(0) ⊗=\hfill 5,\quad V(1) ⊗=\hfill 16,\quad V(2)
⊗=\hfill 45,\quad V(3) ⊗=\hfill 92,\quad V(4) ⊗= 157;\cr
\4skip \hfill W(0) ⊗=\hfill 10,\quad W(1) ⊗=\hfill 304,\quad
W(2) ⊗=\hfill 1980,\quad W(3) ⊗=\hfill 7084,\quad W(4) ⊗=18526.\cr
\4skip (12)
|qright \9skip Our job now is to compute the five coefficients
of $W(x)$ from the latter five values.
There is an attractive little algorithm that can be used to
compute the coefficients of a polynomial $W(x) = W↓{m-1}x↑{m-1}
+ \cdots + W↓1x + W↓0$ when the values $W(0), W(1), \ldotss ,
W(m - 1)$ are given: Let us first write
$$$W(x) = \theta ↓{m-1}x↑{m-1} + \theta ↓{m-2}x↑{m-2} + \cdots
+ \theta ↓1x↑1 + \theta ↓0,\eqno (13)$
|qctr \9skip where $x↑k = x(x - 1) \ldotsm (x - k + 1)$, and
where the $\theta ↓j$ are unknown as well as the $W↓j$. Now
$$$W(x + 1) - W(x) = (m - 1)\theta ↓{m-1}x↑{m-2} + (m - 2)\theta
↓{m-2}x↑{m-3} + \cdots + \theta ↓1$,
|qctr \9skip and by induction we find that for all $k ≥ 0$
|qleft \9skip ${1\over k!} \left(W(x + k) - \left({k\atop 1}}W(x
+ k - 1) +$
|qleft \4skip + \left(${k\atop 2}$}W(x + k - 2) - \cdots + (-1)$↑kW(x)}$
|qright \4skip = \left(${m - 1\atop k}$}\theta $↓{m-1}x↑{m-1-k}
+ \left({m - 2\atop k}}\theta ↓{m-2}x↑{m-2-k} + \cdots + \left({k\atop
k}}\theta ↓k.\eqno (14)$
|qctr \9skip Denoting the left-hand side of (14) in the customary
way as (1/$k!) \Delta ↑kW(x)$, we see that
$$${1\over k!} \Delta ↑kW(x) = {1\over k} \left({1\over (k -
1)!} \Delta ↑{k-1}W(x + 1) - {1\over (k - 1)!} \Delta ↑{k-1}W(x)}$
|qctr \9skip and (1/$k!) \Delta ↑kW(0) = \theta ↓k$. So the
coefficients $\theta ↓j$ can be evaluated using a very simple
method, illustrated here for the polynomial $W(x)$ in (12):
$$
|qctr ⊗\quad \cr
⊗ 294\cr
⊗ 304⊗1382/2 = 691\cr
⊗⊗ 1676⊗⊗1023/3 = 341\cr
⊗ 1980⊗⊗3428/2 = 1714⊗⊗144/4 = 36\eqno (15)\cr
⊗⊗ 5104⊗⊗1455/3 = 485\cr
⊗ 7084⊗⊗6338/2 = 3169\cr
⊗⊗1142\cr
⊗18526\cr
\9skip |newtype The leftmost column of this tableau is a listing
of the given values of $W(0), W(1), \ldotss , W(4)$; the $k$th
succeeding column is obtained by computing the difference between
successive values of the preceding column and dividing by $k$.
The coefficients $\theta ↓i$ appear at the top of the columns,
so that $\theta ↓0 = 10, \theta ↓1 = 294, \ldotss , \theta ↓4
= 36$, and we have
$$
|qctr \hfill W(x) ⊗= 36x$↑4 + 341x↑3 + 691x↑2 + 294x↑1 + 10\cr
\4skip ⊗= \biglp ((36(x - 3) + 341)(x - 2) + 691)(x - 1) + 294)x
+ 10.\eqno (16)\cr
\9skip$ In general, we can write
$$$\cdot \biglp (\theta ↓{m-1}(x - m + 2) + \theta ↓{m-2})(x
- m + 3) + \theta ↓{m-3}) \times$
$$\times (x - m + 4) + \cdots + \theta $↓1\bigrp x + \theta
↓0$,
|qright \9skip and this formula shows how the coefficients $W↓{m-1},
\ldotss , W↓1, W↓0$ can be obtained from the $\theta $'s:
$$\quad 36|tab \qquad -1 \cdot 36|tab \qquad -1 \cdot 111|tab
\qquad -1 \cdot 555|tab \qquad 11|zfa ⊗36⊗341\cr
⊗-3 \cdot 36\cr
|linerule 36⊗233⊗691\cr
⊗-2 \cdot 36⊗-2 \cdot 233⊗⊗⊗(17)\cr
|linerule 36⊗161⊗225⊗294\cr
⊗-1 \cdot 36⊗-1 \cdot 161⊗-1 \cdot 225\cr
|linerule 36⊗125⊗64⊗69⊗10\cr
\9skip Here the numbers below the horizontal lines successively
show the coefficients of the polynomials
$$$\9skip$ \quad \xskip From this tableau we have
$$$W(x) = 36x↑4 + 125x↑3 + 64x↑2 + 69x + 10$,
|qctr \9skip so the answer to our original problem is 1234 \cdot
2341 = $W(16)$, where $W(16)$ is obtained by adding and shifting.
A generalization of this method for obtaining coefficients is
discussed in Section 4.6.4.
The basic Stirling number identity,
$$$x↑n = \left\{{n\atop n}} x↑n + \cdots + \left\{{n\atop 1}}
x↑1 + \left\{n??50??}$,
|qctr \9skip Eq.\ 1.2.6--41, shows that if the coefficients of
$W(x)$ are nonnegative, so are the numbers $\theta ↓j$, and
in such a case {\sl all of the intermediate results in the above
computation are nonnegative.} This further simplifies the Toom--Cook
multiplication algorithm, which we will now consider in detail.
which
are manipulated during this algorithm:
$$\quad \xskip Stack $U, V{\sl :}\qquad$ |tab Temporary storage
of $U(j)$ and $V(j)$ in step C4.⊗\hfill Stack $C{\sl :}⊗$Numbers
to be multiplied, and control codes.\cr
\hfill Stack $W{\sl :}⊗$Storage of $W(j).\cr
\9skip$ These stacks may contain either binary numbers or special
symbols called code-1, code-2, code-3, and code-4. The algorithm
also constructs an auxiliary table of numbers $q↓k, r↓k$; this
table is maintained in such a manner that it may be stored as
a linear list, and all accesses to this table are made in a
simple manner so that a single pointer (which traverses the
list, moving back and forth) may be used to access the current
table entry of interest.
(Stack $C$ and $W$ in this algorithm are used to control the
recursive mechanism of the multiplication algorithm in a reasonably
straightforward manner that is a special case of the general
procedures discussed in Chapter 8.)
\algstep C1. [Compute $q, r$ tables.] Set stacks
$U, V, C$, and $W$ empty. Set
$$\qquad $k ← 1,\qquad q↓0 ← q↓1 ← 16,\qquad r↓0 ← r↓1 ← 4,\qquad
Q ← 4,\qquad R ← 2$.
|qctr \9skip \qquad Now if $q↓{k-1} + q↓k < n$, set
$$$???\qquad k ← k + 1,\qquad Q ← Q + R,\qquad R ← \lfloor |newtype
|newtype \sqrt{2Q}\rfloor ,\qquad q↓k ← 2↑Q,\qquad r↓k ← 2↑R$,
|qctr \9skip \qquad and repeat this operation until $q↓{k-1}
+ q↓k ≥ n. (Note{\sl : }$The calculation of $R ← \lfloor |newtype
|newtype \sqrt{Q}\rfloor$ does not require a square root to
be taken, since we may simply set $R ← R + 1$ if $(R + 1)↑2
≤ Q$ and leave $R$ unchanged if $(R + 1)↑2 > Q$; see exercise
2. In this step we build the sequence
$$$$
|qctr \hfill k ⊗= ⊗0⊗1⊗2⊗3⊗4⊗5⊗6⊗. . .\cr
\4skip \hfill q$↓k ⊗= ⊗2↑4⊗2↑4⊗2↑6⊗2↑8⊗2↑{10}⊗2↑{13}⊗2↑{16}⊗.
. .\cr
\4skip \hfill r↓k ⊗= ⊗2↑2⊗2↑2⊗2↑2⊗2↑2⊗2↑3⊗2↑3⊗2↑4⊗. . .\cr
\9skip |zfa$
|qleft \qquad The multiplication of 70000-bit numbers would
cause this step to terminate with $k = 6$, since 70000 < 2$↑{13}
+ 2↑{16}.)$
\algstep C2. [Put $u, v$ on stack.] Put code-1 on
stack $C$, then place $u$ and $v$ onto stack $C$ as numbers
of exactly $q↓{k-1} + q↓k$ bits each.
\algstep C3. [Check recursion level.] Decrease $k$ by 1. If
$k = 0$, the top of stack $C$ contains two 32-bit numbers, $u$
and $v$; set $w ← uv$ using a built-in routine for multiplying
32-bit numbers, and go to spep C10. If $k > 0$, set $r ← r↓k,
q ← q↓k, p ← q↓k|newtype$ W58320---Computer
%folio 379 galley 12 (C) Addison-Wesley 1978
Programming\quad (Knuth/Addision-Wesley)\quad F.379\quad Ch.4\quad
G.12b.
\algstep C4. [Break into $r + 1$
parts.] Let the number at the top of stack $C$ be regarded as
a list of $r + 1$ numbers with $q$ bits each, $(U↓r \ldotsm U↓1U↓0)↓2|supinf
q$. (The top of stack $C$ now contains an $(r + 1)q = (q↓k +
q↓{k+1})$-bit number.) For $j = 0, 1, \ldotss , 2r$ compute the
$p$-bit numbers
$$$\qquad (\cdot \cdot \cdot (U↓rj + U↓{r-1})j + \cdots + U↓1)j
+ U↓0 = U(j)$
|qctr \9skip \qquad and successively put these values onto stack
$U$. (The bottom of stack $U$ now contains $U(0)$, then comes
$U(1)$, etc., with $U(2r)$ on top. Note that
$$$\qquad U(j) ≤ U(2r) < 2↑q\biglp (2r)↑r + (2r)↑{r-1} + \cdots
+ 1\bigrp < 2↑{q+1}(2r)↑r ≤ 2↑p$,
|qctr \9skip \qquad by exercise 3.) Then remove $U↓r \ldotsm
U↓1U↓0$ from stack $C$.
|qleft \qquad \quad \xskip Now the top of stack $C$ contains
another list of $r + 1 q$-bit numbers, $V↓r \ldotsm V↓1V↓0$,
and the $p$-bit numbers
$$$|newtype \qquad (\cdots (V↓rj + V↓{r-1})j + \cdots + V↓1)j
+ V↓0 = V(j)$
|qctr \9skip \qquad should be put onto stack $V$ in the same
way. After this has been done, remove $V↓r \ldotsm V↓1V↓0$ from
stack $C$.
\algstep C5. [Recurse.] Successively put the following
items onto stack $C$, at the same time emptyping stacks $U$
and $V:$
$$$$\qquad code-2, $V(2r), U(2r)$, code-3, $V(2r - 1), U(2r
- 1), \ldotss $,
$$code-3, $V(1), U(1)$, code-3, $V(0), U(0)$.
|qright \9skip \qquad Put code-4 onto stack $W$. Go back to
step C3.
\algstep C6. [Save one product.] (At this point the multiplication
algorithm has set $W$ to one of the products $W(j) = U(j)V(j).)$
Put $w$ onto stack $W$. (This number $w$ contains 2($q↓k + q↓{k-1})$
bits.) Go back to step C3.
|qleft \12skip |newtype {\bf Fig.\ 8. \xskip} Toom--Cook algorithm
for high-precision multiplication.
\algstep C7. [Find $\theta $'s.] Set $r
← r↓k, q ← q↓k, p ← q↓{k-1} + q↓k$. (At this point stack $W$
contains \ldotss , code-4, $W(0), W(1), \ldotss , W(2r)$ from
bottom to top, where each $W(j)$ is a 2$p$-bit number.)
|qleft \qquad \quad \xskip Now for $j = 1, 2, 3, \ldotss , 2r$,
perform the following loop: For $t = 2r, 2r - 1, 2r - 2, \ldotss
, j$ set $W(t) ← (W(t - 1)\bigrp /j$. (Here $j$ must increase
and $t$ must decrease. The quantity $\biglp W(t) - W(t - 1)\bigrp
/j$ will always be a nonnegative integer that fits in 2$p$
bits; cf.\ (15).)
\algstep C8. [Find $W$'s.] For $j = 2r - 1, 2r - 2, \ldotss ,
1$, perform the following loop: For $t = j, j + 1, \ldotss ,
2r - 1$ set $W(t) ← W(t) - jW(t + 1)$. Here $j$ must decrease
and $t$ must increase. The result of this operation will again
be a nonnegative $2p$-bit integer; cf.\ (17).)
\algstep C9. [Set answer.] Set $w$ to the 2($q↓k + q↓{k+1})$-bit
integer
$$$\qquad (\cdots (W(2r)2↑q + W(2r - 1)\bigrp 2↑q + \cdots
)2↑q + W(0)$.
|qctr \9skip \qquad Remove $W(2r), \ldotss , W(0)$ and code-4
from stack $W$.
\algstep C10. [Return.] Set $k ← k + 1$. Remove
the top of stack $C$. If it is code-3, go to C6. If it is code-2,
put $w$ onto stack $W$ and go to C7. And if it is code-1, terminate
the algorithm ($w$ is the answer).
|qleft \12skip |cancelindent |newtype \quad \xskip Let us now
estimate the running time, $T(n)$, for Algorithm C, in terms
of some things we shall call ``cycles,'' i.e., elementary machine
operations. Step C1 takes $O(q↓k)$ cycles, even if we represent
the number $q↓k$ internally as a long string of $q↓k$ bits followed
by some delimiter, since $q↓k + q↓{k-1} + \cdots + q↓0$ will
be $O(q↓k)$. Step C2 obviously takes $O(q↓k)$ cycles.
Now let $t↓k$ denote the amount of computation required to get
from step C3 to step C10 for a particular value of $k (after
k$ has been decreased at the beginning of step C3). Step C3
requires $O(q)$ cycles at most. Step C4 involves $r$ multiplications
of a lg($r + 1)$-bit number by a $p$-bit number, and $r$ additions
of $p$-bit numbers, all repeated $4r + 2$ times. Thus we need
a total of $O(r↑2q$ ln $r)$ cycles. Step C5 requires moving
$(4r + 2 p)$-bit numbers, so it involves $O(rq)$ cycles. Step
C6 requires $O(q)$ cycles, and it is done $2r + 1$ times per
iteration. The recursion involved when the algorithm essentially
invokes itself (by returning to step C3) requires $t↓{k-1}$
cycles, 2$r + 1$ times. Step C7 requires $O(r↑2)$ subtractions
of $p$-bit numbers and divisions of 2$p$-bit by (lg $r)$-bit
numbers, so it requires $O(r↑2q$ ln $r)$ cycles. Similarly,
step C8 requires $O(r↑2q$ ln $r)$ cycles. Step C9 involves $O(rq)$
cycles, and C10 takes hardly any time at all.
Summing up we have, for $q = q↓k$ and $r = r↓k, T(n) = O(q↓k)
+ O(q↓k) + t↓{k-1}$, where
$$$t↓k |tab = O(q) + O(r↑2q$ ln $r) + O(rq) + (2r + 1)O(q) +
O(r↑2q$ ln $r)⊗\4skip ⊗ \quad + O(r↑2q$ ln $r) + O(rq) + O(q)
+ (2r + 1)t↓{k-1}\cr
\4skip ⊗ = O(r↑2q$ ln $r) + (2r + 1)t↓{k-1};\cr
\9skip$ thus there is a constant $c$ such that
$$$t↓k ≤ cr↑{2}↓{k}q↓k$ lg $r↓k + (2r↓k + 1)t↓{k-1}$.
|qctr \9skip To complete the estimation of $t↓k$ we can prove
by brute force that
$$$t↓k ≤ Cq↓{k+1}2↑$${2.5lgq}|infsup k|infsup \times|infsup
1\eqno (18)$
|qctr \9skip For some constant $C$. Let us choose $C > 20c$,
and let us also take $C$ large enough so that (18) is valid
for $k ≤ k↓0$, where $k↓0$ will be specified below. Then when
$k > k↓0$, let $Q↓k =$ lg $q↓k, R↓k =$ lg $r↓k$; we have by
induction
$$$t↓k |tab ≤ cq↓kr↑{2}↓{k}$ lg $r↓k + (2r↓k + 1)Cq↓k2↑{2.5Q↓k}⊗\4skip
⊗ = Cq↓{k+1}2↑{2.5$lg $q↓{k+1}}(\eta ↓1 + \eta ↓2),\cr
\6skip$ where
$$$\eta ↓1 |tab = {c\over C} R↓k2↑{R↓k-2.5Q↓{k+1}} < {1\over
20} R↓k2↑{-R}|infsup k < 0.05,⊗\4skip \hfill \eta ↓2 ⊗= \left(2
+ {1\over r↓k}} 2↑{2.5(Q↓k-Q↓{k+1})} → 2↑{-1/4} < 0.85,\cr
\6skip$ since
$$$|newtype \sqrt{2Q↓{k+1}} - \sqrt{2Q↓k} = |newtype |newtype
Q↓k + \lfloor \sqrt{2Q↓k}\rfloor - \sqrt{2Q↓k} → {1\over 2}$
|qctr \9skip as $k → ∞$. It follows that we can find $k↓0$ such
that $\eta ↓2 < 0.95$ for all $k > k↓0$, and this completes
the proof of (18) by induction.
Finally, therefore, we may compute $T(n)$; since $n > q↓{k-1}
+ q↓{k-2}$, we have $↓{k-1} < n$; hence
$$$r↓{k-1} = 2↑{\lfloor$ lg $q↓{k-1}\rfloor } < 2↑{$lg $n},\qquad$
and\qquad $q↓k = r↓{k-1}q↓{k-1} < n2↑{$lg $n}$.
|qctr \9skip Thus
$$$t↓{k-1} ≤ Cq↓k2↑{2.5 q↓k} < Cn2↑{$lg $n+2.5($lg $n+1)}$,
|qctr \9skip and, since $T(n) = O(qk) + t↓{k-1}$, we have finally
the following theorem:
\thbegin Theorem C. {\sl There is a constant c↓0 such
that the execution time of Algorithm C is less than c↓0n2↑{2.5}lg
$n} cycles$.
|qleft \12skip |newtype This result is noticeably stronger than
Theorem A, since $n2↑{3.5$lg $n} = n↑{1+3.5/$lg $n}$. By adding
a few complications to the algorithm, pushing the ideas to their
apparent limits (see exercise 5), we can improve the estimated
execution time to
$$$T(n) = O(n2↑{2$ lg $n}$ log $n).\eqno (19)$
\subsectionbegin{B. A modular method} There is another
way to multiply large nqmbers very rapidly, based on the ideas
of modular arithmetic as presented in Section 4.3.2. It is very
hard to believe at first that this method can be of advantage,
since a multiplication algorithm based on modular arithmetic
must include the choice of moduli and the conversion of numbers
into and out of modular representation, besides the actual multiplication
operation itself. In spite of these formidable difficulties,
A. Sc|newtype W58320---
%folio 382 galley 13 (C) Addison-Wesley 1978
Computer Programming\quad (Knuth/Addision-Wesley)\quad F.382\quad
Ch.4\quad G.13b.
|qleft \20skip |newtype \quad \xskip In order to understand
the essential mechanism of Sch|spose 4ohage's method, we shall
look at a special case. Consider the sequence defined by the
rules
$$$q↓0 = 1,\qquad q↓{k+1} = 3q↓k - 1,\eqno (20)$
|qctr \9skip so that $q↓k = 3↑k - 3↑{k-1} - \cdots - 1 = {1\over
2}(3↑k + 1)$. We will study a procedure that jultiplies (18$q↓k
+ 8)$-bit numbers, in terms of a method for multiplying $(18q↓{k-1}
+ 8)$-bit numbers. Thus, if we know how to multiply numbers
having (18$q↓0 + 8) = 26$ bits, the procedure to be described
will show us how to multiply numbers of (18$q↓1 + 8) = 44$ bits,
then 98 bits, then 260 bits, etc., eventually increasing the
number of bits by almost a factor of 3 at each step.
Let $p↓k = 18q↓k + 8$. When multiplying $p↓k$-bit numbers, the
idea is to use the six moduli
$$$m↓1 = 2↑{6q}|infsup k↑{-1} - 1,\qquad m↓2 = 2↑{6q}|infsup
k↑{+1} - 1,\qquad m↓3 = 2↑{6q}|infsup k↑{+2} - 1$,
|qctr \4skip m$↓4 = 2↑{6q}|infsup k↑{+3} - 1,\qquad m↓5 = 2↑{6q}|infsup
k↑{+5} - 1,\qquad m↓6 = 2↑{6q}|infsup k↑{+7} - 1.\eqno (21)$
|qctr \9skip These moduli are relatively prime, by Eq.\ 4.3.2--18,
since the exponents
$$$6q↓k - 1,\qquad 6q↓k + 1,\qquad 6q↓k + 2,\qquad 6q↓k + 3,\qquad
6q↓k + 5,\qquad 6q↓k + 7\eqno (22)$
|qctr \9skip are always relatively prime (see exercise 6). The
six moduli in (21) are capable of representing numbers up to
$m = m↓1m↓2m↓3m↓4m↓5m↓6 > 2↑{36q}|infsup k↑{+16} = 2↑{2p}|infsup
k$, so there is no chance of overflow in the multiplication
of $p↓k$-bit numbers $u$ and $v$. Thus we may use the following
method:
$$|indent a) Compute $u↓1 = u$ mod $m↓1, \ldotss , u↓6 = u$ mod
$m↓6; v↓1 = v$ mod $m↓1, \ldotss , v↓6 = v$ mod $m↓6$.
|qleft b) Multiply $u↓1$ by $v↓1, u↓2$ by $v↓2, \ldotss , u↓6$
by $v↓6$. These are numbers of at most $6q↓k + 7 = 18q↓{k-1}
+ 1 < p↓{k-1}$ bits, so the multiplications can be performed
by using the assumed $p↓{k-1}$-bit multiplication procedure.
|qleft c) Compute $w↓1 = u↓1v↓1$ mod $m↓1, w↓2 = u↓2v↓2$ mod
$m↓2, \ldotss , w↓6 = u↓6v↓6$ mod $m↓6$.
|qleft d) Compute $w$ such that 0 ≤ $w < m, w$ mod $m↓1 = w↓1,
\ldotss , w$ mod $m↓6 = w↓6$.
|qleft \12skip |cancelindent \quad \xskip Let $t↓k$ be the amount
of time needed for this process. It is not hard to see that
operation (a) takes $O(p↓k)$ cycles, since the determination
of $u$ mod(2$↑2 - 1)$ is quite simple (like ``casting-out nines''),
as shown in Section 4.3.2. Similarly, operation (c) takes $O(p↓k)$
cycles. Operation (b) requires essentially 6$t↓{k-1}$ cycles.
This leaves us with operation (d), which seems to be quite a
difficult computation; but Sch|spose 4ohage has found an ingenious
way to perform step (d) in $O(p↓k$ log $p↓k)$ cycles, and this
is the crux of the method. As a consequence, we have
$$$t↓k = 6t↓{k-1} + O(p↓k$ log $p↓k)$.
|qctr \9skip Since $p↓k = 3↑{k+2} + 17$, we can show that
$$$t↓k = O(6↑k) = O(p↑{1.63}↓{k}).\eqno (23)$
|qctr \9skip (See exercise 7.)
|qleft \12skip \quad \xskip So although this method is more
complicated than the $O(n$$↑{lg3})$ procedure given at the beginning
of the section, it does, in fact, lead to an execution time
substantially better than $O(n↑2)$ for the multiplication of
$n$-bit numbers. Thus we can improve on the classical method
by using either of two completely different approaches.
Let us now analyze operation (d) above. Assume that we are given
the positive integers $e↓1 < e↓2 < \cdots < e↓r$, relatively
prime in pairs; let
$$$m↓1 = 2↑e|infsup 1 - 1,\qquad m↓2 = 2↑e|infsup 2 - 1,\qquad
\ldotss ,\qquad m↓r = 2↑e|infsup r - 1.\eqno (24)$
|qctr \9skip We are also given numbers $w↓1, \ldotss , w↓r$ such
that $0 ≤ w↓j ≤ m↓j$. Our job is {\sl to determine the binary
representation of the number w that satisfies the conditions}
$$0 ≤ w < m$↓1m↓2 \ldotsm m↓r$,
|qctr \4skip w ≡ w$↓1\qquad$ (modulo $m↓1),\qquad \ldotss ,\qquad
w ≡ w↓r\quad$ (modulo $m↓r).\eqno (25)$
|qctr \9skip The method is based on (23) and (24) of Section
4.3.2; first we compute
$$$w↑{↑\prime}↓{j} = (\cdots \biglp (w↓j - w↑{↑\prime}↓{1})c↓{1j}
- w↑{↑\prime}↓{2})c↓{2j} - \cdots - w↑{↑\prime}↓{j}↓{-1}\bigrp
c↓{(j-1)j}$ mod $m↓j,\eqno (26)$
|qctr \9skip for $j = 2, \ldotss , r$, where $w↑{↑\prime}↓{1}
= w↓1$ mod $m↓1$; then we compute
$$$w = \biglp \cdots (w↑{↑\prime}↓{r}m↓{r-1} + w↑{↑\prime}↓{r-1})m↓{r-2}
+ \cdots + w↑{↑\prime}↓{2}\bigrp m↓1 + w↑{↑\prime}↓{1}.\eqno
(27)$
|qctr \9skip Here $c↓{ij}$ is a number such that $c↓{ij}m↓i
≡ 1$ (modulo $m↓j)$; these numbers $c↓{ij}$ are not given, they
must be determined from the $e↓j$'s.
The calculation of (26) for all $j$ involves (↑{$r}↓{2})$ additions
modulo $m↓j$, each of which takes $O(e↓r)$ cycles, plus (↑{$r}↓{2})$
multiplications by $c↓{ij}$, modulo $m↓j$. The calculation of
$w$ by formula (27) involves $r$ additions and $r$ multiplications
by $m↓j$; it is easy to multiply by $m↓j$, since this is just
adding, shifting, and subtracting, so it is clear that the evaluation
of Eq.\ (27) takes $O(r↑2e↓r)$ cycles. We will soon see that
each of the multiplications by $c↓{ij}$, modulo $m↓j$, requires
only $O(e↓r$ log $e↓r)$ cycles, and therefore {\sl the entire
job of conversion can be done in O(r↑2e↓r} log $e↓r) cycles$.
|qleft \quad \xskip The above observations leave us with the
following problem to solve: Given positive integers $e < f$
and a nonnegative integer $u < 2↑f$, compute $(cu)$mod(2$↑f
- 1)$, where $c$ is the number such that (2$↑e - 1)c ≡ 1$ (modulo
2$↑f - 1)$; and we must do this in $O(f$ log $f)$ cycles. The
result of exercise 4.3.2--6 gives a formula for $c$ which suggests
a procedure that can be used. First we find the least positive
integer $b$ such that
$$$be ≡ 1\quad$ (modulo $f).\eqno (28)$
|qctr \9skip This can be done using Euclid's algorithm in $O\biglp
($log $f)↑3\bigrp$ cycles, since Euclid's algorithm applied
to $e$ and $f$ requires $O($log $f)$ iterations, and each iteration
requires $O\biglp$ (log $f)↑2\bigrp$ cycles; alternatively,
we could be very sloppy here without violating the total time
constraint, by simply trying $b = 1, 2$, etc., untll (28) is
satisfied, and such a process would take $O(f$ log $f)$ cycles
in all. Once $b$ has been found, exercise 4.3.2--6 tells us
that
$$$c = c[b] = \left(\sum ↓{0≤j<b}2↑{ej}}$mod(2$↑f - 1).\eqno
(29)$
|qctr \9skip \quad \xskip A brute-force multiplication of $(cu)$
mod (2$↑f - 1)$ would not be good enough to solve the problem,
since we do not know how to multiply general $f$-bit numbers
in $O(f$ log $f)$ cycles. But the special form of $c$ provides
a clue: The binary representation of $c$ is composed of bits
in a regular pattern, and Eq.\ (29) shows that the number $c[2b]$
can be obtained in a simple way from {\sl c[b].} This suggests
that we can rapidly multiply a number $u$ by {\sl c[b]} if we
build {\sl c[b]u} up in lg $b$ steps in a suitably clever manner,
such as the following: Let the binary notation for $b$ be
$$$b = (b↓s \ldotsm b↓2b↓1b↓0)↓2$;
|qctr \9skip we may calculate the sequences $a↓k, d↓k, u↓k,
v↓k$ defined by the rules
$$$$
|qctr \hfill a$↓0 ⊗= e,\hfill a↓k ⊗= 2a↓{k-1}$ mod $f;\cr
\4skip \hfill d↓0 ⊗= b↓0e,\hfill d↓k ⊗= (d↓{k-1} + b↓ka↓k)$mod
$f;\cr
\4skip \hfill u↓0 ⊗= u,\hfill u↓k ⊗= (u↓{k-1} + 2↑a|infsup k|infsup
-|infsup 1u↓{k-1})$mod(2$↑f - 1);\cr
\4skip \hfill v↓0 ⊗= b↓0u,\hfill v↓k ⊗= (v↓{k-1} + b↓k2↑d|infsup
k|infsup -|infsup 1u↓k)$mod(2$↑f - 1).\eqno (30)\cr
\9skip$ It is easy to prove by induction on $k$ that
$$
|qctr \hfill a$↓k ⊗= (2↑ke)$mod $f;\hfill d↓k ⊗= \biglp (b↓k
\ldotsm b↓1b↓0)↓2e\bigrp$ mod $f;\cr
\4skip \hfill u↓k ⊗= (c[2↑k]u)$mod(2$↑f - 1);\hfill v↓k ⊗= \biglp
c[(b↓k \ldotsm b↓1b↓0)↓2]u\bigrp$ mod(2$↑f - 1).\eqno (31)\cr
\9skip$ Hence the desired result, ({\sl c[b]u)}mod(2$↑f - 1)$,
is $v↓s$. The calculation of $a↓k, d↓k, u↓k, v↓k$ from $a↓{k-1},
d↓{k-1}, y↓{k-1}, v↓{k-1}$ takes $O($log {\sl fHence the desired
result, (c[b]u)}mod(2$↑f - 1)$, is $v↓s$. The calculation of
$a↓k, d↓k, u↓k, v↓k$ from $a↓{k-1}, d↓{k-1}, y↓{k-1}, v↓{k-1}$
takes $O($log $f) + O($log $f) + O(f) + O(f) = O(f)$ cycles,
and therefore the entire calculation can be done in $sO(f) =
O(f$ log $f)$ cycles as desired.
The reader will find it instructive to study the ingenious method
represented by (30) and (31) very carefully. Similar techniques
are discussed in Section 4.6.3.
%folio 385 galley 14 (C) Addison-Wesley 1978 *
Sch\"onhage's paper [{\sl Computing \bf 1} (1966), 182--196]
shows that these ideas can be extended to the multiplication
of $n$-bit numbers using $r \approx 2↑{\sqrt{2\lg n}}$ moduli, obtaining
a method analogous to Algorithm C. We shall not dwell on the
details here, since Algorithm C is always superior; in fact,
an even better method is next on our agenda.
\subsectionbegin{C. Use of Fourier transforms} The
critical problem in high-precision multiplication is the determination
of ``convolution products'' such as
$$u↓rv↓0 + u↓{r-1}v↓1 + \cdots + u↓0v↓r,$$
and there is an intimate relation between
convolutions and finite Fourier transforms. If $\omega =\exp(2πi/K)$
is a $K$th root of unity, the one-dimensional Fourier transform
of $(u↓0, u↓1, \ldotss , u↓{K-1})$ may be defined to be $(
{\A u}↓0,{\A u}↓1, \ldotss , {\A u}↓{\!K-1})$, where
$${\A u}↓s = \sum ↓{0≤t<K}\omega ↑{st}u↓t,\qquad 0 ≤ s <
K.\eqno (32)$$
Letting $({\A v}↓0, {\A v}↓1, \ldotss ,
{\A v}↓{K-1})$ be defined in the same way, as the Fourier transform of $(v↓0,
v↓1, \ldotss , v↓{K-1})$, it is not difficult to see that $(
{\A u}↓0{\A v}↓0, {\A u}↓1{\A v}↓1, \ldotss , {\A u}↓{\!K-1}
{\A v}↓{\!K-1})$ is the transform of $(w↓0, w↓1, \ldotss , w↓{K-1})$,
where
$$\eqalign{w↓r⊗ = u↓rv↓0 + u↓{r-1}v↓1 + \cdots + u↓0v↓r + u↓{K-1}v↓{r+1}
+ \cdots + u↓{r+1}v↓{K-1}\cr
\noalign{\vskip 7pt}
⊗ = \sum ↓{i+j≡r\modulo K}u↓iv↓j.\cr}$$
When $K ≥ 2n-1$ and $u↓n = u↓{n+1} = \cdots = u↓{K-1}
= v↓n = v↓{n+1} = \cdots = v↓{K-1} = 0$, the $w$'s are just what we
need for multiplication; {\sl the transform of a convolution
product is the ordinary product of the transforms.} This idea
is actually a special case of Toom's use of polynomials $\biglp$cf.\ (10)$\bigrp
$, with $x$ replaced by roots of unity.
The above property of Fourier transforms was exploited by V.
Strassen in 1968, using a sufficiently precise binary representation
of the complex number $\omega $, to multiply large numbers faster
than was possible under all previously known schemes. In 1970,
he and A. Sch\"onhage found an elegant way to modify the
method, avoiding all the complications of complex numbers and
obtaining a very pretty algorithm capable of multiplying two
$n$-bit numbers in $O(n\log n\log\log n)$ steps. We shall now
study their remarkable approach [cf.\ {\sl Computing \bf 7} (1971),
281--292], in a simplified form suggested by V. R. Pratt.
It is convenient in the first place to replace $n$ by $2↑n$,
and to seek a procedure that multiplies $2↑n$-bit numbers in
$O(2↑n\,n\log n)$ steps. Roughly speaking, we shall reduce
the problem of multiplying $2↑n$-bit numbers to the problem
of doing about $2↑{n/2}$ multiplications of $2↑{n/2}$-bit numbers,
with $O(2↑n n)$ auxiliary steps required to piece these products
together properly; then there will be $\lg n$ levels of recursion
with $O(2↑n n)$ steps per level, making a total of $O(2↑n\,n
\log n)$ steps as desired.
Let $N = 2↑n$ and suppose we wish to compute the product of
$u$ and $v$, where $0 ≤ u, v < 2↑N$. As in Algorithm C we shall
break these $N$-bit numbers into groups; let
$$\baselineskip16pt\vjust{\halign{$\hfill\dispstyle{#}\hfill$\cr
k + l = n + 1,\cr
K = 2↑k,\qquad L = 2↑l,\cr}}$$
and write
$$u = (U↓{K/2-1} \ldotsm U↓1U↓0)↓{2↑L},\qquad v = (V↓{K/2-1}
\ldotsm V↓1V↓0)↓{2↑L},$$
regarding $u$ and $v$ as $2↑{k-1}$ groups of $2↑l$-bit
numbers. We will select appropriate values for $k$ and $l$ later;
it turns out (see exercise 10) that we will need to have
$$1 ≤ k ≤ l + 3≤n,\eqno (33)$$
but no other conditions. The above representation
of $u$ and $v$ implies as before that
$$u \cdot v = W↓{K-2}2↑{(K-2)L} + \cdots + W↓12↑L + W↓0,\eqno
(34)$$
where
$$W↓r = \sum ↓{i+j=r}U↓iV↓j = \sum ↓{i+j≡r\modulo K}U↓iV↓j,\eqno
(35)$$
if we define $U↓i = V↓j = 0$ for $i, j ≥ K/2$.
Clearly $0 ≤ W↓r ≤ (K/2)(2↑L - 1)↑2 < 2↑{2L+k-1}$; therefore
if we knew the $W$'s, we could compute $u \cdot v$ by adding
up the terms in (34), in $O\biglp K(2L + k)\bigrp = O(N)$ further
steps.
Our goal is to compute the $W↓r$ exactly; and we can do this
by computing their value mod $M$, where $M$ is any number larger
than $(K/2)(2↑L - 1)↑2$. The key idea is that we can choose
$M = 2↑{4L} + 1$, and compute the $W↓r$ by doing a ``fast Fourier
transform'' modulo $M$, where the $K$th root of unity $\omega$
we use is a power of 2 (so that multiplication by powers of
$\omega$ is very simple).
Before discussing this idea in detail, a numerical example of
what we've said so far may help to fix the ideas. Suppose that
we want to multiply two 4096-bit numbers, obtaining an 8192-bit
product; thus $n = 12$ in the above discussion, and we may choose
$k = 8$, $l = 5$. The bits of $u$ and $v$ are partitioned into
128 groups of 32 bits each, and the basic idea is to find the
256 convolution products (35) and to add them together (after
appropriate shifting). The convolution products have at most
$64 + 7 = 71$ bits each, so it surely suffices to determine them
modulo $M = 2↑{128} + 1$. We will see that it is possible to
find the convolution products rapidly by first computing their
finite Fourier transform mod $M$, using the integer $\omega
= 2$ as a 256th ``root of unity.'' These integer calculations
mod $M$ turn out to have all the necessary properties of complex
roots of unity in the ordinary Fourier transform (32).
Arithmetic mod $(2↑m + 1)$ is somewhat similar to ones' complement
arithmetic, mod $(2↑m - 1)$, although it is slightly more complicated;
we have already investigated the idea briefly in Section 3.2.1.1.
Numbers can be represented as $m$-bit quantities in binary notation,
except for the special value $-1 ≡ 2↑m$, which may be represented
in some special way. Addition mod $(2↑m + 1)$
is easily done in $O(m)$ cycles, since
a carry off the left end merely means that we must subtract 1 at
the right; similarly, subtraction mod $(2↑m + 1)$ is quite simple.
Furthermore, we can multiply by $2↑r$ in $O(m)$ cycles, when
$0 ≤ r < m$, since
$$2↑r \cdot (u↓{m-1} \ldotsm u↓1u↓0)↓2 ≡ (u↓{m-r-1} \ldotsm
u↓0\,0 \ldotsm 0)↓2 - (0 \ldotsm 0\,u↓{m-1}\ldotsm u↓{m-r})↓2.\eqno(36)$$
%folio 388 galley 15 (C) Addison-Wesley 1978 *
Given a sequence of $K = 2↑k$ integers
$(a↓0, \ldotss , a↓{K-1})$, and an integer $\omega$ such that
$\omega ↑K ≡ 1\modulo M$, the integer Fourier transform
$${\A a}↓s =\bigglp\sum ↓{0≤t<K}\omega ↑{st}a↓t\biggrp\mod M,\qquad
0 ≤ s < K,\eqno (37)$$
can be calculated rapidly as follows. (In these
formulas the $s↓j$ and $t↓j$ are either 0 or 1, so that each
step represents $2↑k$ computations.)
\yskip\noalign Step 0.\xskip Let $A↑{[0]}(t↓{k-1},
\ldotss , t↓0) = a↓t$, where $t = (t↓{k-1} \ldotsm t↓0)↓2$.
\yskip\noalign Step 1.\xskip Set $A↑{[1]}(s↓{k-1},t↓{k-2}, \ldotss , t↓0)←$
\penalty1000\vskip2pt
\hjust to size{\hfill$ \biglp A↑{[0]}(0, t↓{k-2}, \ldotss , t↓0) +
\omega ↑{(s↓{k-1}0\ldotsm0)↓2} \cdot A↑{[0]}(1, t↓{k-2}, \ldotss
, t↓0)\bigrp \mod M$.}
\yskip\noalign Step 2.\xskip Set $A↑{[2]}(s↓{k-1},
s↓{k-2}, t↓{k-3}, \ldotss , t↓0)←$
\penalty1000\vskip2pt
\hjust to size{\hfill$ \biglp A↑{[1]}(s↓{k-1}, 0, t↓{k-3}, \ldotss
, t↓0) + \omega ↑{(s↓{k-2}s↓{k-1}0 \ldotsm 0)↓2} \cdot A↑{[1]}(s↓{k-1},
1, t↓{k-3}, \ldotss , t↓0)\bigrp \mod M$.}
\yskip$\ldots$
\yskip\noalign Step $k$.\xskip Set $A↑{[k]}(s↓{k-1},
\ldotss , s↓1, s↓0)←$
\penalty1000\vskip2pt
\hjust to size{\hfill$\biglp A↑{[k-1]}(s↓{k-1}, \ldotss , s↓1, 0)
+ \omega ↑{(s↓0s↓1 \ldotsm s↓{k-1})↓2} \cdot A↑{[k-1]}(s↓{k-1},
\ldotss , s↓1, 1)\bigrp \mod M$.}
\yskip\noindent It is not difficult to prove by induction that
we have
$$\twoline{A↑{[j]}(s↓{k-1}, \ldotss , s↓{k-j}, t↓{k-j-1}, \ldotss
, t↓0)}{7pt}{\null=\sum ↓{0≤t↓{k-1}, \ldotss , t↓{k-j}≤1}\omega
↑{(s↓0s↓1 \ldotsm s↓{k-1})↓2 \cdot (t↓{k-1} \ldotsm t↓{k-j}0 \ldotsm
0)↓2} a↓t \mod M,}$$
so that
$$A↑{[k]}(s↓{k-1}, \ldotss , s↓1, s↓0) = {\A a}↓s,\qquad
\hjust{where $s = (s↓0s↓1 \ldotsm s↓{k-1})↓2$.}$$
(Note the reversed order of the binary digits in
$s$. For further discussion of transforms such as this, see
Section 4.6.4.)
Now we have enough machinery at
our disposal to do the calculation of all $W↓r$ as promised.
Let $\omega = 2↑{2↑{l+3-k}}$, so that $\omega ↑K = 2↑{8L}
≡ 1\modulo M$, where $M = 2↑{4L} + 1$. The integer fast
Fourier transform algorithm above can be applied to $(U↓0, \ldotss
, U↓{K-1})$ to obtain $({\A U}↓{\!0}, \ldotss , {\A U}↓{\!K-1})$;
each of the $k$ steps involves $2↑k$ computations of the form
$c = (a + 2↑eb)\mod M$, so the running time is $O(k2↑kL) =
O(kN)$. Similarly we obtain $({\A V}↓{\!0}, \ldotss , {\A V}↓{\!K-1})$
in $O(kN)$ steps. The next step is to compute
$$(a↓0, a↓1, \ldotss , a↓{K-1}) = (U↓0V↓0, U↓1V↓1, \ldotss ,
U↓{K-1}V↓{K-1})\mod M,$$
using a high-speed multiplication procedure for
each of these products, obtaining the results mod $M$ by subtracting
the most significant halves from the least significant halves.
If we now use the fast Fourier transform a third time, obtaining
$({\A a}↓0, {\A a}↓1, \ldotss , {\A a}↓{K-1})$, this
is enough to determine $(W↓0, W↓1, \ldotss , W↓{K-1})$ without
much more work, since we shall prove that
$$2↑kW↓r ≡ {\A a}↓{(-r)\mod K}\modulo M.\eqno (38)$$
This congruence means that an appropriate shifting
operation, namely to multiply $-{\A a}↓{(-r)\mod K}$ by $2↑{4L-k}\mod
M$ as in (36), finally yields $W↓r$.
All this may seem like magic, but it works;
a careful study of the above remarks will show that the method
is very clever but not a complete mystery. The proof of (38)
relies primarily on the fact that $\omega ↑{K/2} ≡ -1\modulo
M$, because this fact can be used to prove that
$$\sum ↓{0≤t<K}\omega ↑{st} = \left\{\vcenter{\baselineskip14pt
\halign{$#$,\hfill⊗\qquad if $s\mod K#$\hfill\cr
K⊗=0;\cr 0⊗≠0.\cr}}\right.\eqno(39)$$
For when $s \mod K ≠ 0$, let $s \mod K = 2↑pq$
where $q$ is odd and $0 ≤ p < k$. Setting $T = 2↑{k-1-p}$, we
have $\omega ↑{sT} ≡ \omega ↑{qK/2} ≡ -1$; hence $\omega ↑{s(t+T)}
≡ -\omega↑{st}$, and terms $T$ steps apart in the sum (39) cancel in pairs.
From (39) we deduce (38) as follows:
$$\eqalign{{\A a}↓s\;⊗≡\;\sum↓{0≤t<K}\omega↑{st}{\A U}↓{\!t}{\A V}↓{\!t}\;≡\;\sum
↓{0≤t,i,j<K}\omega↑{st}\,\omega↑{ti}U↓i\;\omega↑{tj}V↓j\cr
\noalign{\vskip7pt}
⊗\;≡\sum↓{0≤i,j<K}U↓iV↓j\sum↓{0≤t<K}\omega↑{(s+i+j)t}\;≡\;K
\sum↓{\scriptstyle0≤i,j<K\atop\scriptstyle i+j+s≡0\modulo K}U↓iV↓j.\cr}$$
The multiplication procedure is
nearly complete; it remains for us to specify $k$ and $l$, and
to total up the amount of work involved. Let $M(n)$ denote the
time it takes to multiply $2↑n$-bit numbers by the above method,
and let $M↑\prime (n) = M(n)/2↑n$. The calculation time involves
$O(kN)$ steps for the three Fourier transforms and the other
auxiliary operations, plus $2↑k$ multiplications of
integers in the interval $[0, 2↑{4L}]$, hence we have
$$M(n) = 2↑kM(l + 2) + O(kN);\qquad M↑\prime (n) = 2M↑\prime
(l + 2) + O(k).$$
We get the best reduction of $M↑\prime (n)$ when
$l$ is chosen to be as low as possible, consistent with (33),
so we set
$$k = \lfloor n/2\rfloor + 2,\qquad l = \lceil n/2\rceil -
1.\eqno (40)$$
We have proved that there is a constant
$C$ such that
$$M↑\prime (n) ≤ 2M↑\prime (\lceil (n - 2)/2\rceil + 2) + Cn,\qquad
\hjust{for all $n ≥ 4$.}$$
Iterating this relation (cf.\ exercise 1.2.4--35)
yields
$$M↑\prime (n) ≤ 2↑jM↑\prime (\lceil (n - 2)/2↑j\rceil + 2)
+ C(2↑{j-1}j\lceil (n - 2)/2↑{j-1}\rceil + 2↑{j+1} - 2),$$
for $j = 1$, 2, $\ldotss$, $\lceil\lg (n - 2)\rceil
$; and $j = \lceil\lg(n - 2)\rceil$ yields $M↑\prime (n)
= O(n\log n)$. We have proved the main result of this section:
\algbegin Theorem S ({\rm A. Sch\"onhage, V. Strassen}).
{\sl It is possible to multiply two $n$-bit numbers in $O(n\log
n\log\log n)$ steps.}
\yyskip
Our formulation of the multiplication
procedure was designed primarily for simplicity of exposition,
it does not turn out to be an especially fast method for small
$n$; for example, a lot of the bits that the above method deals
with are known to be zero. Thus the algorithm needs to be refined
somewhat if it is ever to become competitive with Algorithm
C when $n$ is in a practical range. As $n → ∞$, of course, fast
Fourier multiplication becomes vastly superior to Algorithm
C. John Pollard has presented a fast Fourier multiplication
algorithm that is useful for moderately large $n$, in {\sl
Math.\ Comp.\ \bf 25} (1971), 365--374.
%folio 392 galley 16 (C) Addison-Wesley 1978
programming\quad (Knuth/Addision-Wesley)\quad F.392\quad Ch.4\quad
G.16b.
The word ``steps'' in Theorem S has been used somewhat loosely;
we have implicitly been assuming a ``conventional computer''
with unlimited random-access memory, which takes one unit of
time to read and write any bit. This assumption is quite unrealistic
as $n → ∞$, since we need $O(\log n)$ bits in an instruction
or an index register just to be able to distinguish between
$n$ memory cells, so
the actual time to access memory on a ``conventional computer''
is really proportional to $\log n$. We often forget this dependence
on $n$ because it does not occur on real machines with bounded
memory and bounded register size. When $n$ becomes really large
the only physically appropriate model seems to be a finite memory
with a finite number of arbitrarily long tapes; the fast Fourier
|newtype W58320---Computer
|qleft \20skip |newtype \quad \xskip The difference between
these computer models can be clarified by considering another
method due to Sch|spose 4onhage and Strassen: If $n = 2↑m \cdot
m$, so that $m \approx$ lg $n$ and $2↑m \approx n/$lg $n$, it
is possible to use the fast Fourier transform over the complex
numbers to compute the product of two $n$-bit numbers by doing
$O(m \cdot 2↑m)$ multiplications of $6m$-bit numbers. Each of
the latter can be broken into 12$↑2 = 144 multiplications of
({1\over 2}m)$-bit numbers. Now we can construct a multiplication
table containing all products $xy$ with $0 ≤ x, y < 2↑{(1/2)m}$,
by repeated addition, in $O(m \cdot 2↑{(1/2)m} \cdot 2↑{(1/2)m})$
steps; then each of the $O(m \cdot 2↑m)$ needed products can
be done by table lookup in $O(m)$ steps. The total number of
steps for {\sl this} procedure therefore comes to $O(m↑22↑m)
= O(n$ log $n)$; we have gotten rid of the factor log log $n$
in Theorem S, but the method really {\sl requires} an unbounded
random-access memory since the table lookup cannot be done efficiently
with a finite number of tapes. (Of course, a factor of log log
$n$ is utterly negligible in practice; when $n$ changes from
10$↑9$ to 10$↑{18}, lg lg n$ increases by only one.)
Perhaps $O(n$ log $n$ log $n)$ will turn out to be the fastest
achievalbe multiplication speed, on the tape model, and $O(n$
log $n)$ on the unlimited random-access model; no such result
has yet been proved. The best lower bound known to date is a
rather deep theorem proved by Michael S. Paterson, Michael J.
Fischer, and Albert R. Meyer [{\sl SIAM-AMS Proceedings \bf 7}
(1974), 97--111], based on techniques originally introduced
by S. A. Cook and S. Aanderaa, that under certain restrictions
there is no algorithm that multiplies $n$-bit numbers with
an average of fewer than
$$$O(n$ log $n/$log log $n)\eqno (41)$
|qctr \9skip operations. The restrictions under which (41) is
a lower bound are rather severe: (a) The $(k + 1)$st input symbols
of the operands, from right to left, must not be read by the
algorithm until after the $k$th output symbol has been produced;
and (b) the internal tables kept by the algorithm must have
a ``uniform'' structure↔,????????????``uniform'' structure,
in an appropriate sense. The latter restriction rules out algorithms
that use general List structures for their internal tables,
and the first restriction rules out both Algorithm C and Algorithm
S. It is still conceivable (though unlikely) that an algorithm
which violates (a) or (b) could multiply $n$-bit numbers in
$O(n)$ cycles. M. J. Fischer and L. J. Stockmeyer have shown
[{\sl J. Computer and System Sciences \bf 9} (1974), 317--331]
that multiplication under restrictions (a) and (b) is possible
in $O(n($log $n)↑2$ log log $n)$ steps.
\subsectionbegin{D. Division} Using a fast multiplication
routine, we can now show that division can also be accomplished
in $O(n$ log $n$ log log $n)$ cycles, for some constant $α$.
|qleft \quad \xskip To divide an $n$-bit number $u$ by an $n$-bit
number $v$, we may first find an $n$-bit approximation to 1/$v$,
then multiply by $u$ to get an approximation $|spose 7q$ to
$u/v$, and, finally, we can make the slight correction necessary
to $|spose 7q$ to ensure that $0 ≤ u - qv < v$ by using another
multiplication. From this reasoning, we see that it suffices
to have an algorithm that approximates the reciprocal of an
$n$-bit number, in $O(n$ log $n$ log log $n)$ cycles. The following
algorithm achieves this, using ``Newton's method'' as explained
at the end of Section 4.3.1:
\algbegin Algorithm R (High-precision reciprocal).
Let $v$ have the binary representation $v = (0.v↓1v↓2v↓3 . .
.)↓2$, where $v↓1 = 1$. This algorithm computes an approximation
$z$ to 1/$v$, which satisfies
$$$|z - 1/v| ≤ 2↑{-n}$.
\algstep R1. [Initial approximation.] Set
$z ← {1\over 4}\lfloor 32/(4v↓1 + 2v↓2 + v↓3)\rfloor , k ← 0$.
\algstep R2. [Newtonian iteration.] (At this point we have a
number $z$ of the binary form {\sl xx.xx \ldotsm x} with $2↑k
+ 1$ places after the radix point, and $z ≤ 2.)$ Calculate $z↑2
= xxx.xx \ldotsm x$ exactly, using a high-speed multiplication
routine. Then calculate $V↓kz↑2$ exactly, where $V↓k = (0.v↓1v↓2
\ldotsm v↓2|supinf k|supinf +|supinf 1↓{+3})↓2$. Then set $z
← 2z - V↓kz↑2 + r$, where 0 ≤ $r < 2↑{-2}|supsup k|supsup +|supsup
1↑{-1}$ is added if necessary to ``round up'' $z$ so it is a
multiple of 2$↑{-2}|supsup k|supsup +|supsup 1↑{-1}$. Finally,
set $k ← k + 1$.
\algstep R3. [Test for end.] If $2↑k < n$, go back to step R2;
otherwise the algorithm terminates.
|qleft |cancelindent \12skip \quad \xskip This algorithm is
a modification of a method suggested by S. A. Cook. A similar
technique has been used in computer hardware [see Anderson,
Earle, Goldschmidt, and Powers, {\sl IBM J. Res.\ Dev.\ \bf 11}
(1967), 48--52]. Of course, it is necessary to check the accuracy
of Algorithm R quite carefully, because it comes very close
to being inaccurate. We will prove by induction that
$$$z ≤ 2\qquad$ and\qquad $|z - 1/v| ≤ 2↑{-2}|supsup k\eqno
(42)$
|qctr \9skip at the beginning and end of step R2.
For this purpose, let $\delta ↓k = 1/v - z↓k$, where $z↓k$ is
the value of $z$ afxte?????? $z↓k$ is the value of $z$ after
$k$ iterations of step R2. To start the induction on $k$, we
have $\delta ↓0 = 1/v - 8/v↑\prime + (32/v↑\prime - \lfloor
32/v↑\prime \rfloor )4 = \eta ↓1 + \eta ↓2$, where $v↑\prime
= (v↓1v↓2v↓3)↓2, \eta ↓1 = (v↑\prime - 8v)/vv↑\prime$ satisfies
$-{1\over 2}$ < $\eta ↓1 ≤ 0$, and 0 ≤ $\eta ↓2 < {1\over 4}$.
Hence $|\delta ↓0| < {1\over 2}$. Now suppose (42) has been
verified for $k$; then
$$$$
|qctr \hfill \delta $↓{k+1} = 1/v - z↓{k+1} ⊗= 1/v - z↓k - z↓k(1
- z↓kV↓k) - r\cr
\4skip ⊗ = \delta ↓k - z↓k(1 - z↓kv) - z↑{2}↓{k}(v - V↓k) -
r\cr
\4skip ⊗ = \delta ↓k - (1/v - \delta ↓k)v\delta ↓k - z↑{2}↓{k}(v
- V↓k) - r\cr
\4skip ⊗ = v\delta ↑{2}↓{k} - z↑{2}↓{k}(v - V↓k) - r.\cr
\9skip$ Now
$$$0 ≤ v\delta ↑{2}↓{k} < \delta ↑{2}↓{k} ≤ (2↑{-2}|supsup k)↑2
= 2↑{-2}|supsup k|supsup +|supsup 1$,
|qctr \6skip and
$$$0 ≤ z↑2(v - V↓k) + r < 4(2↑{-2}|supsup k|supsup \times|supsup
1↑{-3}) + 2↑{-2}|supsup k|supsup +|supsup 1↑{-1} = 2↑{-2}|supsup
k|supsup +|supsup 1$,
|qctr \9skip so |$\delta ↓{k+1}| ≤ 2↑{-2}|supsup k|supsup +|supsup
1$. We must still verify the first inequality of (42); to show
that $z↓{k+1} ≤ 2$, there are three cases:\xskip (a) $V↓k = {1\over
2}$; then $z↓{k+1} = 2$.\xskip (b) $V↓k ≠ {1\over 2} = V↓{k-1}$; then
$z↓k = 2$, so $2z↓k - z↑{2}↓{k}V↓k ≤ 2 - 2↑{-2}|supsup k|supsup
+|supsup 1↑{-1}$.\xskip (c) $V↓{k-1} ≠ {1\over 2}$; then $z↓{k+1}
= 1/v - \delta ↓{k+1} < 2 - 2↑{-2}|supsup k↑{-2} ?? 2↑{-2}|supsup
k|supsup +|supsup 1 ≤ 2$, since $k ??40$.
|qleft \quad \xskip The running time of Algorithm R is bounded
by
$$$2T(4n) + 2T(2n) + 2T(n) + 2T({1\over 2}n) \cdot \cdot es,
where T(n)$ is an upper bound on the time needed to do a multiplication
of $n$-bit numbers. When $T(n) = C n$ log $n$ log log $n$, we
have $T(4n) + T(2n) + T(n) + \cdots < T(8n)$, so division can
be done with a speed comparable to that of multiplication except
for a constant factor.
\subsectionbegin{E. An even faster multiplication method}
It is natural to wonder if multiplication of $n$-bit numbers
can actually be accomplished in just $n$ steps; we have come
from $n↑2$ down to $n↑{1+|}≤e$, so perhaps we can squeeze the
time down even more. This is still an unsolved problem, as pointed
out above, but it is interesting to note that the best possible
time, exactly $n$ cycles, {\sl can} be achieved if we leave
the domain of conventional computer programming and allow ourselves
to build a computer that has an unlimited number of components
all acting at once.
A {\sl linear iterative array} of automata is a set of devices
$M↓1$, $M↓2$, $M↓3$, $\ldotss,$ which can each be in a finite set of
``states,'' at each step of the computation. The machines $M↓2,
M↓3, . . $. all have {\sl identical} circuitry, and their state
at time $t + 1$ is a function of their own state at time $t$
as well as the states of their left and right neighbors at time
$t$. The first machine $M↓1$ is slightly different: its state
at time $t + 1$ is a function of its own state and that of $M↓2$,
at time $t$, and also of the {\sl input} at time $t$. The {\sl
output} of a linear iterative array is a function defined on
the states of $M↓1$.
|qleft \quad \xskip Let $u = (u↓{n-1} \ldotsm u↓1u↓0)↓2, v =
(v↓{n-1} \ldotsm v↓1v↓0)↓2$, and $q = (q↓{n-1} \ldotsm q↓1q↓0)↓2$
be binary numbers, and let $uv + q = w = (w↓{2n-1} \ldotsm w↓1w↓0)↓2$.
It is remarkable fact that a linear iterative array can be constructed,
independent of $n$, that will output $w↓0, w↓1, w↓2, . . $.
at times 1, 2, 3, \ldotss , if it is given the inputs $(u↓0,
v↓0, q↓0), (u↓1, v↓1, q↓1), (u↓2, v↓2, q↓2)),????????????output
w↓0, w↓1, w↓2, . . $. at times 1, 2, 3, \ldotss , if it is given
the inputs $(u↓0, v↓0, q↓0), (u↓1, v↓1, q↓1), (u↓2, v↓2, q↓2),
. . $. at times 0, 1, 2, $\ldotss\,$.
We can state this phenomenon in the language of computer hardware,
by saying that it is possible to design a single ``integrated
circuit module'' with the following property: If we wire together
sufficiently many of these devices in a straight line, with
each module communicating only with its left and right neighbor,
the resulting circuitry will produce the $2n$-bit product of
$n$-bit numbers in exactly $2n$ clock pulses.
Here is the basic idea behind this construction: At time 0,
$M↓1$ senses $(u↓0, v↓0, q↓0)$ and it therefore is able to output
($u↓0v↓0 + q↓0)$ mod 2 at time 1. Then it sees $(u↓1, v↓1, q↓1)$
and it can output $(u↓0v↓1 + u↓1v↓0 + q↓1 + k↓1)$ mod 2, where
$k↓1$ is the ``carry'' left over from the previous step, at
time 2. Next it sees $(u↓2, v↓2, q↓2)$ and outputs $(u↓0v↓2
+ u↓1v↓1 + u↓2v↓0 + q↓2 + k↓2)$mod 2; furthermore, its state
records the values of $u↓2$ and $v↓2$ so that machine $M↓2$
will be able to sense these values at time 3, and $M↓2$ will
be able to compute $u↓2v↓2$ for the benefit of $M↓1$ at time
4. Thus $M↓1$ arranges to start $M↓2$ multiplying the sequence
$(u↓2, v↓2), (u↓3, v↓3), \ldotss $, and $M↓2$ will ultimately
give $M↓3$ the job of multiplying $(u↓4, v↓4), (u↓5, v↓5)$,
etc. For|newtype W58320---Computer Programming\quad (Knuth
%folio 394 galley 17 (C) Addison-Wesley 1978
/Addision-Wesley)\quad f.394\quad Ch.4\quad G.17b.
|qleft \20skip |newtype \quad \xskip Each automaton has 2$↑{11}
states (c, x↓0, y↓0, x↓1, y↓1, x, y, z↓2, z↓1, z↓0)$, where
$0 ≤ c < 4$ and each of the $x$'s, $y$'s and $z$'s is either
0 or 1. Initially, all devices are in state (0, 0, 0, 0, 0,
0, 0, 0, 0, 0). Suppose that a machine $M↓j, j > 1$, is in state
($c, x↓0, y↓0, x↓1, y↓1, x, y, z↓2, z↓1, z↓0)$ at time $t$,
and its left neighbor $M↓{j-1}$ is in state\cr
\9skip
|qctr ⊗⊗if\hfill $c↑l ⊗= 3,\hfill 0⊗$otherwise;\cr
\4skip $\hfill x↑{↑\prime}↓{0} ⊗= x↑$l⊗if$\hfill c ⊗= 0,\hfill
x↓0⊗$otherwise;\cr
\4skip $\hfill y↑{↑\prime}↓{0} ⊗= y↑l⊗$if\hfill $c ⊗= 0,\hfill
y↓0⊗$otherwise;\cr
\4skip $\hfill x↑{↑\prime}↓{1} ⊗= x↑l⊗$if\hfill $c ⊗= 1,\hfill
x↓1⊗$otherwise;\eqno (43)\cr
\4skip $\hfill y↑{↑\prime}↓{1} ⊗= y↑l⊗$if\hfill $c ⊗= 1,\hfill
y↓1⊗$otherwise;\cr
\4skip $\hfill x↑\prime ⊗= x↑l⊗$if\hfill $c ≥ 2,\hfill x⊗$otherwise;\cr
\4skip $\hfill y↑\prime ⊗= y↑l⊗$if\hfill $c ⊗≥ 2,\hfill y$⊗otherwise;\cr
\9skip and $(z↑{↑\prime}↓{2}z↑{↑\prime}↓{1}z↑{↑\prime}↓{0})↓2$
is the binary notation for
$$$$
|qctr \24skip (44)|zfa
|qright \hfill z↑{r}↓{0} + z$↓1 + z↑{l}↓{2} + |zfa \cr
\-24skip ⊗x↑ly↑l,⊗$if\hfill $c ⊗= 0;\cr
\4skip ⊗x↓0y↑l + x↑ly↓0,⊗$if\hfill $c ⊗= 1;\cr
\4skip ⊗x↓0y↑l + x↓1y↓1 + x↑ly↓0,⊗$if\hfill $c ⊗= 2;\cr
\4skip ⊗x↓0y↑l + x↓1y + xy↓1 + x↑ly↓0,⊗$if\hfill $c ⊗= 3.\cr
\9skip$ |newtype The leftmost machine $M↓1$ behaves in almost
the same way as the others; it acts exactly as if there were
a machine to its left in state (3, 0, 0, 0, 0, u, v, q, 0, 0)
when it is receiving the inputs $(u, v, q)$. The output of the
array is the $z↓0$ component of $M↓1$.
|qleft \quad \xskip Table 1 shows an example of this array acting
on the inputs $u = v = (\ldotsm 00010111)↓2, q = (\ldotsm 00001011)↓2$.
The output sequence appears in the lower right portion of the
states of $M↓1: 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, \ldotss $, representing
the number (\ldotsm 01000011100)$↓2 from right to left$.
This construction is based on a similar one first published
by A. J. Atrubin, {\sl IEEE Trans. \bf EC--14} (1965),
394--399. S. Winograd [{\sl JACM \bf 14} (1967), 793--802] has
investigated the minimum multiplication time achievable in a
logical circuit when $n$ is given and when the inputs are available
all at once in coded form; see also C. S. Wallace, {\sl IEEE
Trans.\ \bf EC--13} (1964), 14--17.
R. P. Brent has shown that functions such as log $x$, exp $x$,
and arctan $x$ can be evaluated to $n$ significant bits in $O(n($log
$n)↑2$ log log $n)$ steps, using high-speed multiplication [{\sl
JACM}, to appear].
|qleft \24skip {\:r EXERCISES
\exno 1. [22] The idea
expressed in (2) can be generalized to the decimal system, if
the radix 2 is replaced by 10. Using this generalization, calculate
2718 times 4742 (reducing this product of four-digit numbers
to three products of two-digit numbers, and reducing each of
the latter to products of one-digit numbers).
\exno 2. [M22] Prove that, in step Cl of Algorithm C, the value
of $R$ either stays the same or increases by one when we set
$R ← \lfloor |newtype |newtype \sqrt{Q}\rfloor $. (Therefore,
as observed in that step, we need not calculate a square root.)
\exno 3. [M23] Prove that the sequences $q↓k, r↓k$ defined in
Algorithm C satisfy the inequality $2↑q|infsup k↑{+1}(2r↓k)↑r|infsup
k ≤ 2↑q|infsup k|infsup -|infsup 1↑{+q}|infsup k$, when $k >
0$.
\exno 4. [28] (K. Baker.)\xskip Show
that it is advantageous to evaluate the polynomial $W(x)$ at
the points $x = -r, \ldotss , 0, \ldotss , r$ instead of at the
points $x = 0, 1, \ldotss , 2r$ as in Algorithm C. The polynomial
$U(x)$ can be written $U(x) = U↓e(x↑2) + xU↓o(x↑2)$, and similarly
$V(x)$ and $W(x)$ can be expanded in this way; show how to exploit
this idea, obtaining faster calculations in steps C7 and C8.
\exno 5. [35] Show that
if in step C1 Algorithm C we set $R ← \lceil |newtype |newtype
\sqrt{2Q}\rceil + 1$ instead of $R ← \lfloor |newtype |newtype
\sqrt{Q}\rfloor $, with suitable initial values of $q↓0, q↓1,
r↓0$, and $r↓1$, then (19) can be improved to $t↓k ≤ q↓{k+1}2↑{2$
log$↓2 q↓{k+1}}($log$↓2 q↓{k+1})$.
\exno 6. [M23] Prove that the six
numbers in (22) are relatively prime in pairs.
\exno 7. [M23] Prove (23).
\exno 8. [M27] Why does the fast Fourier multiplication algorithm
bother to work mod(2$↑N + 1)$ instead of mod(2$↑N - 1)?$ It
would seem to be much simpler to do everything mod(2$↑N - 1)$,
avoiding a lot of miscellaneous minus signs in the formulas,
since $\omega = 2$ can be used to compute fast Fourier transforms
mod(2$↑2|supsup n - 1)$. What would go wrong?
\exno 10. [M21] Where is condition (33) used?
\exno 11. [M26] If $n$ is fixed, how many of the automata in
the linear iterative array (43), (44) are needed to compute
the product of $n$-bit numbers? (Note that the automaton $M↓j$
is only influenced by the component $z↑{r}↓{0}$ of the machine
on its right, so we may remove all automata whose $z↓0$ component
is always zero whenever the inputs are $n$-bit numbers.)
\exno 12. [M50] Improve on the lower bound (41); is it impossible
for a general node-structure automation (as described in Section
2.6) to multiply $n$-bit numbers in $O(n)$ cycles?
\exno 13. [M25] (A. Sch|spose 4onhage.)\xskip What is a good upper
bound on the time needed to multiply an $m$-bit number by an
$n$-bit number, when both $m$ and $n$ are very large but $n$
is much larger than $m$, based on the results proved in this
section for $m = n?$
\exno 14. [M42] Write a program for Algorithm C, incorporating
the improvements of exercise 4. Compare it with a program for
Algorithm 4.3.1M and with a program based on (2), to see how
large $n$ must be before Algorithm C is an improvement.
\exno 9. [M20] What is $|spose
7u↓r$ $\biglp$the result of two successive Fourier transforms
check no.→→→→(2)$\bigrp$?
|qleft \24skip |newtype {\:r 4.4. RADIX CONVERSION
|qleft }\12skip If men had invented arithmetic by counting with
their two fists or their eight fingers, instead of their ten
``digits,'' we woworry about writing binary-decimal conversion
routines. (And we would perhaps never have learned as much about
number systems.) In this section, we shall discuss the conversion
of|newtype W58320---Computer
%folio 395 galley 18 (C) Addison-Wesley 1978
Programming\quad (Knuth/Addision-Wesley)\quad f.395\quad Ch.4\quad
G.18b.
|qleft \20skip |newtype {\:r Table 1
}|qctr \3skip |newtype MULTIPLICATION IN A LINEAR ITERATIVE
ARRAY
|qctr \15skip |newtype |tab \qquad \qquad \qquad \quad |tab
\qquad |tab \qquad \qquad \qquad \quad |tab |zfa ⊗Module $M↓2⊗⊗$Module
$M↓3\cr
\11skip |tab \quad |tab \quad |tab \quad |tab \quad |tab \quad
|tab \qquad |tab \quad |tab \quad |tab \quad |tab \quad |tab
\quad |tab |zfa ⊗|newtype ⊗⊗⊗⊗z↓2⊗⊗⊗⊗⊗⊗z↓2\cr
⊗x↓0⊗x↓1⊗x⊗⊗⊗⊗x↓0⊗x↓1⊗x\cr
c⊗⊗⊗⊗z↓1⊗⊗c⊗⊗⊗⊗z↓1\cr
⊗y↓0⊗y↓1⊗y⊗⊗⊗⊗y↓0⊗y↓1⊗y\cr
⊗⊗⊗⊗z↓0⊗⊗⊗⊗⊗⊗z↓0\cr
\9skip ⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr
⊗0⊗0⊗0⊗⊗⊗⊗0⊗0⊗0\cr
0⊗⊗⊗⊗0⊗⊗0⊗⊗⊗⊗0\cr
⊗0⊗0⊗0⊗⊗⊗⊗0⊗0⊗0\cr
⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr
⊗0⊗0⊗0⊗⊗⊗⊗0⊗0⊗0\cr
0⊗⊗⊗⊗0⊗⊗0⊗⊗⊗⊗0\cr
⊗0⊗0⊗0⊗⊗⊗⊗0⊗0⊗0\cr
⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr
⊗0⊗0⊗0⊗⊗⊗⊗0⊗0⊗0\cr
0⊗⊗⊗⊗0⊗⊗0⊗⊗⊗⊗0\cr
⊗0⊗0⊗0⊗⊗⊗⊗0⊗0⊗0\cr
⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr
⊗0⊗0⊗0⊗⊗⊗⊗0⊗0⊗0\cr
0⊗⊗⊗⊗0⊗⊗0⊗⊗⊗⊗0\cr
⊗0⊗0⊗0⊗⊗⊗⊗0⊗0⊗0\cr
⊗⊗⊗⊗1⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr
⊗1⊗0⊗0⊗⊗⊗⊗0⊗0⊗0\cr
1⊗⊗⊗⊗0⊗⊗0⊗⊗⊗⊗0\cr
⊗1⊗0⊗0⊗⊗⊗⊗0⊗0⊗0\cr
⊗⊗⊗⊗1⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr
⊗1⊗0⊗0⊗⊗⊗⊗0⊗0⊗0\cr
2⊗⊗⊗⊗0⊗⊗0⊗⊗⊗⊗0\cr
⊗1⊗0⊗0⊗⊗⊗⊗0⊗0⊗0\cr
⊗⊗⊗⊗1⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr
⊗1⊗0⊗1⊗⊗⊗⊗0⊗0⊗0\cr
3⊗⊗⊗⊗1⊗⊗0⊗⊗⊗⊗0\cr
⊗1⊗0⊗1⊗⊗⊗⊗0⊗0⊗0\cr
⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr
⊗1⊗0⊗0⊗⊗⊗⊗1⊗0⊗0\cr
3⊗⊗⊗⊗1⊗⊗1⊗⊗⊗⊗0\cr
⊗1⊗0⊗0⊗⊗⊗⊗1⊗0⊗0\cr
⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗1\cr \cr
\2skip ⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr
⊗1⊗0⊗0⊗⊗⊗⊗1⊗0⊗0\cr
3⊗⊗⊗⊗1⊗⊗2⊗⊗⊗⊗0\cr
⊗1⊗0⊗0⊗⊗⊗⊗1⊗0⊗0\cr
⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr
⊗1⊗0⊗0⊗⊗⊗⊗1⊗0⊗0\cr
3⊗⊗⊗⊗0⊗⊗3⊗⊗⊗⊗0\cr
⊗1⊗0⊗0⊗⊗⊗⊗1⊗0⊗0\cr
⊗⊗⊗⊗1⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr
⊗1⊗0⊗0⊗⊗⊗⊗1⊗0⊗0\cr
3⊗⊗⊗⊗0⊗⊗3⊗⊗⊗⊗0\cr
⊗1⊗0⊗0⊗⊗⊗⊗1⊗0⊗0\cr
⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr
⊗1⊗0⊗0⊗⊗⊗⊗1⊗0⊗0\cr
3⊗⊗⊗⊗0⊗⊗3⊗⊗⊗⊗0\cr
⊗1⊗0⊗0⊗⊗⊗⊗1⊗0⊗0\cr
⊗⊗⊗⊗0⊗⊗⊗⊗⊗⊗0\cr \cr
\12skip |newtype |tab \qquad \quad |tab \qquad |tab \qquad \quad
|tab \qquad |tab \qquad \qquad \qquad \quad |tab |zfa ⊗$Time⊗⊗Input⊗⊗Module
$M↓1\cr
\11skip |tab \qquad \quad |tab \qquad |tab \quad |tab \quad
|tab \qquad |tab \quad |tab \quad |tab \quad |tab \quad |tab
\quad |tab |zfa ⊗|newtype ⊗⊗⊗⊗⊗⊗⊗⊗⊗z↓2\cr
⊗⊗⊗⊗⊗⊗x↓0⊗x↓1⊗x\cr
⊗⊗⊗⊗⊗c⊗⊗⊗⊗z↓1\cr
⊗⊗v↓j⊗⊗⊗⊗y↓0⊗y↓1⊗y\cr
⊗⊗⊗⊗⊗⊗⊗⊗⊗z↓0\cr \cr
\10skip ⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr
⊗⊗1⊗⊗⊗⊗0⊗0⊗0\cr
0⊗⊗1⊗⊗0⊗⊗⊗⊗0\cr
⊗⊗1⊗⊗⊗⊗0⊗0⊗0\cr
⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr
⊗⊗1⊗⊗⊗⊗1⊗0⊗0\cr
1⊗⊗⊗1⊗⊗1⊗⊗⊗⊗1\cr
⊗⊗1⊗⊗⊗⊗1⊗0⊗0\cr
⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗⊗⊗⊗⊗⊗1\cr
⊗⊗1⊗⊗⊗⊗1⊗1⊗0\cr
2⊗⊗⊗0⊗⊗2⊗⊗⊗⊗0\cr
⊗⊗1⊗⊗⊗⊗1⊗1⊗0\cr
⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗1\cr
3⊗⊗⊗1⊗⊗3⊗⊗⊗⊗1\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗1\cr
⊗⊗⊗⊗⊗⊗⊗⊗⊗1\cr \cr
\2skip ⊗⊗⊗⊗⊗⊗⊗⊗⊗1\cr
⊗⊗1⊗⊗⊗⊗1⊗1⊗0\cr
4⊗⊗⊗0⊗⊗3⊗⊗⊗⊗0\cr
⊗⊗1⊗⊗⊗⊗1⊗1⊗0\cr
⊗⊗⊗⊗⊗⊗⊗⊗⊗1\cr \cr
\2skip ⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗1\cr
5⊗⊗⊗0⊗⊗3⊗⊗⊗⊗1\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗1\cr
⊗⊗⊗⊗⊗⊗⊗⊗⊗1\cr \cr
\2skip ⊗⊗⊗⊗⊗⊗⊗⊗⊗1\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗0\cr
6⊗⊗⊗0⊗⊗3⊗⊗⊗⊗0\cr
⊗⊗⊗0⊗⊗3⊗⊗⊗⊗0\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗0\cr
⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗0\cr
7⊗⊗⊗0⊗⊗3⊗⊗⊗⊗0\cr
⊗⊗⊗0⊗⊗32⊗⊗⊗⊗0\cr
⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗0\cr
8⊗⊗⊗0⊗⊗3⊗⊗⊗⊗0\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗0\cr
⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗0\cr
9⊗⊗⊗0⊗⊗3⊗⊗⊗⊗0\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗0\cr
⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr \cr
\2skip ⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗0\cr
10⊗⊗⊗0⊗⊗3⊗⊗⊗⊗0\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗0\cr
⊗⊗⊗⊗⊗⊗⊗⊗⊗1\cr \cr
\2skip ⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗0\cr
11⊗⊗⊗0⊗⊗3⊗⊗⊗⊗0\cr
⊗⊗0⊗⊗⊗⊗1⊗1⊗0\cr
⊗⊗⊗⊗⊗⊗⊗⊗⊗0\cr
??????????????\quad ??\quad ??\quad ??\quad ??\quad ??\quad
??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad
??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad
??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad
??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad ??\quad
??\quad ??\quad ??\quad$